> ev . 
Bite 0s 


International Academic 
Research Center - USC 


Francesco DELVECCHIO 
Giuseppe DELVECCHIO 
Francesco Domenico D’OvipDIO 


Genetic Scaling 
for Ordinal Variables 


M@ETOAS 
- 2020 - 


International Academic 
Research Center - USC 


Francesco Delvecchio 
Giuseppe Delvecchio 
Francesco Domenico d’Ovidio 


Genetic Scaling 
for Ordinal Variables 


IARC-usc — 2020 — ETOAs) 


© Copyright 2020: International Academic Research Center Str. 
& European Tourism Quality Association sbl 


ISBN: 978-2-93 1089-06-4 


Summary 


IND StACEwasds tect LIA wees ots ies ean iis ead oe aS ese tis anes eee eae 5 


Francesco Delvecchio, Francesco D. d’Ovidio 
1. Notes on some quantification methods for ordinal data..............2. 7 


Giuseppe Delvecchio, Francesco D. d’Ovidio 

2. Determination of the constrained maximums and minimums by 
genetic algorithms to study the quantification of ordinal variables..... 15 
Francesco Delvecchio, Giuseppe Delvecchio 

3. The Genetic Scaling as a tool for the comparison of averages of 


data measured in ordinal scale.........cc cece cece ccc ceseeeeeecccccesssseeeceeceeeeenens 37 


Francesco D. d’Ovidio 
4. Some conclusive CONSICeratiOn ...........ceeesesesessssesssssssssssssssssssssteeees 59 


Bibliographic references... cee eeceeceseeeseeseeeeseeeseecseecnaecaecnaeenaeens 61 


Appendix: Matlab Code 


(credits: G. Delvecchio, University of Bari, Italy) .....ccccccccccsseeeseeees 65 
TERZOL AN Ses 25 ceveceiesstcsen ces teedhdseucize custeevucseasets cupele adds teesudtorsteesigeaeousececeaueeetincess 65 
HOLZOLZ IN ss jess sctetvA sh tetigssceel ed ss haseecteni ab tatigs oack A een ee wed ss aa 70 
trone quant! ete see sacs onseecaviee bo aateethaciecscussrevd ive ON aise 74 
CalCMIM2rn Sie sic eee ee aathied Gar naa atien Goreme 1D 
Cale XA Xai s scsecavassevevia ds pence ssn stanaeein shen ses tease se epungagaaeeecbavensieesatveneaae guneeseetnees 76 
fithessM LM 22M ois3sieesves eestor eee taie adencavg caged cave urtessiteneeraeeaou siete 77 


FitMES SX AX 1M ok cess Seeh tee eee benoen sebeted decennial tees 78 


Matlab modules (minimally edited by the Authors) 


POMCUC AM 5:5 spsscesseeasisee cheesasnsabeb es cepessee ses bades skpedecs ciesnes sgpeseavebe ious dpobtse babies 719 
Check. fermimation3 0 03.23. 2eccevesedicedveevesvestiee ted cescueneieedageseusenusedaecsdea tes sentecags 87 
MVC STD bs bas se Shah ay Sin Le seteneee eda occ ina aaa nee cba, eee en seats 88 
TMU ALS AD. fen da saci cases vativestp ede rea econ da vec caved tvseticestgatenypetede dt tuvteenuyensetecedettasegtiedes 89 
PEPLOMU CAD ss5.255 505 sscsss cess deevesSetssessecces assis chancdse dees Setseessesdsenacipseansees eossbteusescess 90 
SeléctiOnMitarc etek ASSN nea Saar han tek aibasaass 91 
KO VEL erevscseens accu ves etncnmstecges sessnennechtaweusenanaateaccdh acapenconvasaunrcahinseaebaaant gen oeee 92 
deCodesinys 2 cis. nrktien aed ee nation mee thane Ane ee, 93 
CM CODS IIo sey se cas deve easepsnnc eceeeeusaauned Cee cesenesabedeate dees spuneoug tayeieevesunniputeeea neues 94 
bLOtO2 AM tsi he at ee 95 


Spl 2cartn smh 2-35 sssscesces beck a sisusest isessbesdessceescsveebanebeneoed cavase ssdaupses casesscessen sees? 96 


Genetic scaling for ordinal variables, ISBN: 978-2-931089-06-4 5 


Abstract 


This book aims to propose a method to quantify ordinal variables 
through the optimization of an objective function. 

There are various methods for quantifying (scaling) ordinal statisti- 
cal variables, but if the researcher wishes to make comparisons between 
two or more groups regarding the same feature, he should optimize the 
differences between their distributions, whether they concern assess- 
ments, attitudes, opinions, or other features. To do this, he needs to op- 
timize linear forms and quadratic forms subject to linear constraints of 
inequality and quadratic constraints of equality. 

This book suggests a solution to the problem, i.e. the use of genetic 
algorithms. If genetic algorithms are used there’s no need for infor- 
mation about the gradient of the objective function and it’s impossible 
to get relative and non-absolute extremes. This paper also implements 
the rules for deciding whether average evaluations are equal or not. 

The next section of this book is intended to apply the above tech- 
nique in order to quantify the ordinal statistical variables. This method 
is subjected to clearly-defined objective rules and is, therefore, freed 
from the researcher’s will, thus being more reliable and more consistent 
than other methods of quantification that are already present in the lit- 
erature. The method used to compare average evaluations expressed at 
a qualitative ordinal level is described in the first part of this paper. Be- 
sides, the method itself is validated by comparing the opinions that have 
been expressed by a sample of university graduates about the effective- 
ness of university education in terms of job exploitability; in particular, 
the interviewed people have been divided according to the different 
Faculties they attended as students, and to their current job condition. 

A broad Appendix is given at the end of this book, including the 
MATLAB® code expressly written to perform this method. 


Keywords: Quantification, Scaling, Ordinal statistical variables, Comparisons, 
Evaluation, Genetic algorithms, Constrained maximums and minimums. 


6 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


Francesco Delvecchio, Mathematician and Social Statistician, is Professor 
Emeritus of the University of Bari Aldo Moro (Italy) with almost 50 
years of activity in the same university (since 1958, being assistant 
to Tommaso Salvemini). In his long career, he wrote almost a hun- 
dred papers and several books. 


Giuseppe Delvecchio, Engineering MS and PhD, scholar of mathematics algo- 
rithms, is Engineer at the Technical Structure of the University of 
Bari Aldo Moro (Italy). In his scientific career, he wrote over twenty 


papers. 


Francesco Domenico d’Ovidio, Social Statistician, specialized in Services As- 
sessment, is Associated Professor at the University of Bari Aldo 
Moro (Italy). In his career, he wrote over a hundred papers in vari- 
ous scientific fields, and some books. 


Genetic scaling for ordinal variables, ISBN: 978-2-931089-06-4 7 


1 
Notes on some quantification methods 
for ordinal data 


. , m . * 
Francesco Delvecchio, Francesco D. d’Ovidio 


1. Introduction: the ''statistical problem" of the ordinal scales 


By studying some social phenomena, such as the answers given to a 
questionnaire containing answers expressed on an ordinal scale (qual- 
itative verbal modalities, scores or ranks), a fundamental problem 
must be solved in order to carry out several statistical operations and 
analyses allowed with the interval scales. 

This problem is related to the very nature of ordinal scales. As is 
known, in fact, ordinal variables are not true measures, that is, they do 
not satisfy the metric properties of measurement scales: in other 
words, they are rarely able to provide reliable or reproducible 
measures, and a given increase in score can indicate increases differ- 
ent (of the character that the score is meant to measure) in different 
subjects or in the same subject as conditions change or, even, in the 
same circumstance and for the same subject, for different levels of the 
feature (generally, the difference existing between 1=excellent, and 
2=good is not the same difference that exists between 4=bad, and 
S=very bad). 

In order to be used, therefore, ordinal indicators should be trans- 
formed into quantitative and continuous "objective measures”, "cali- 
brated" (at least virtually) along the entire range of real numbers. 

Moreover, in order to be measurable, an indicator must be unidi- 
mensional along a theoretical gradient "from less to more" which ob- 
viously corresponds to an equally monotonous description of the phe- 
nomena. 

Ordinal modalities can be converted on a metric scale with different 
quantification methods, more or less adequate; all those methods, if 


“This chapter was jointly realized by the two authors, but F. Delvecchio wrote sec- 
tions 2 and 4, while F. d’Ovidio wrote sections 1, 3 and 5. 


8 F. Delvecchio, F. D. d’Ovidio 


they are limited to intervening on the single gradient of the categorical 
variable, belong to the field of unidimensional scaling' techniques. 


2. Some simple scaling techniques 


It is known that quantification (or scaling) is a procedure through which 
real numbers are assigned to a qualitative ordinal feature. Quantifica- 
tion allows to turn information expressed at a qualitative ordinal level 
into information expressed at intervals level. 

Besides, it is necessary to put in evidence that in the field of social 
sciences the procedure of quantification is used to convert evaluations, 
opinions and attitudes towards a particular stimulating phenomenon 
(that is the subject of evaluation) into numbers, in accordance with the 
ordered methods of a qualitative feature (for instance, according to the 
intensity level of satisfaction). This procedure allows the researcher to 
make some statistical operations that are usually carried out with the 
interval scales, thus obtaining pieces of information that are easier to be 
used. 

The simplest unidimensional approaches to this problem are the fol- 
lowing ones: 


2.1 Directly-determined quantification of the observed modalities 


This scaling procedure, that it is used when the character modalities are 
not already expressed as numbers (scores or ranks), is obtained by ar- 
bitrarily assigning relative and conventional integers (that are usually 
equidistant) to the ordered statistical variable. This principle is fre- 
quently used because of its simplicity (besides, it is the method that 
MIUR, the Ministry of Education and Science, currently uses to assess 
university didactics). 

However, such technique must be used very carefully, not only be- 
cause it has in itself several inconveniences related to the arbitrary as- 
signment, but also because, for instance, conventionally assigning 


' As the name itself suggests, unidimensional scaling theory and techniques help the 
researcher to select some characteristics (categories or items) which, on the basis of 
empirical evidence, correspond (and are linked) to a single dimension or latent con- 
tinuum. 


1. Notes on some quantification methods for ordinal data 9 


equidistant numbers to these methods implicitly pre-supposes, at the 
same time, that they should be equally spaced at semantic level. 

This simplifying hypothesis of equidistance of the steps of a ordinal 
scale, of course, derives from the admission of a lack of knowledge 
about the extent of its modalities in the continuum considered, but it is 
generally very difficult to support and control this kind of hypothesis’. 


2.2 Indirectly-determined quantification of the observed modalities 


The indirectly-determined quantification is obtained through hypothe- 
ses about the form of distribution. It provides for the assignment, to the 
modalities of an ordinal statistical variable, of real numbers not neces- 
sarily equidistant, but arranged so that their values follow one another 
at scale distances that are closer to the reality observed 
ex post. 

In Psychometrics, the postulate at the basis of this method assumes 
that each single individual has a pre-established position in a continuous 
intensity scale of subjective evaluations (the so-called “psychological 
continuum”); it also assumes that each position is statistically detecta- 
ble, so that all the above-mentioned values that are expressed by the 
different individuals form a well-known statistical distribution. Quanti- 
fication is indirectly determined by the zones that are subtended to the 
curve related to this distribution, following procedures that Thurstone 
(1925), Torgenson (1958, 1967) and other Authors used by presuppos- 
ing normal distribution of data’. 

It is evident that even this method includes a subjective component 
(that is the choice of the density function that best fits with the empirical 
distribution), even if its theoretical basis seems to be more convincing 
that the previous. 


More informations about those simple quantification methods can 
be found in Delvecchio, 2015, p. 420-423. 


? Marbach (1974) clearly showed that this simplistic hypothesis is hardly reflected in 
the actual world. 

3 The extension of this procedure to other patterns is, however, advisable: for exam- 
ple, Crocetta and Toma (2003) quantified the methods by assuming as theoretical 
models the straight line, the exponential function and the Beta distribution in order 
to get quality indicators for university didactics. 


10 F. Delvecchio, F. D. d’Ovidio 


3. Quantification through "scale construction" 


In social sciences, another type of unidimensional scaling is consti- 
tuted by the scale construction methods, i.e. the attempts to indirectly 
assign scores, through appropriate measurement scales, to both varia- 
bles (stimuli) and subjects; such scores should be able to measure at- 
titude or repulsion of the subjets with respect to a latent continuum. 

These methods have their own internal consistency, in the sense 
that no other statistical function or model is invoked to obtain data 
scaling; in other words, the quantification of ordinal variables on a 
continuum (psychological, economic, sociological) occurs only 
through items and answers, without any underlying distributive hy- 
pothesis. The scale construction methods were the first proposed in 
order to quantify the attitude of the subjects with respect to a latent 
variable on the basis of responses to K categorical items (typically, 
responses to questionnaires), long before Item-Response Theory 
reached its highest milestone with the Rasch Analysis (Rasch, 1960). 

In fact, the first attempts to measure the actual responses of sub- 
jects under one or more stimuli on a plurality of items, through the 
construction of scales, were made again by Thurstone (1925, 1929), 
followed by Likert (1932) and Guttman (1941). 

The typical situation concerns N subjects who must assign scores 
to K questions administered to them, based on their attitude towards a 
latent factor (unknown to the researcher, who only hypothesizes its 
existence). 

Assuming the scaling technique proposed by Thurstone (1929), as 
a simple example, it should be remembered that the underlying proce- 
dure requires a group of experts to propose a large number of items, 
each of which with a large number of categories; to the central one 
corresponds to an indifferent attitude towards the latent factor, to the 
extremes, respectively, complete agreement or complete disagree- 
ment. This method is based on the hypothesis of distributive normality 
of the latent factor for each item and above all on the hypothesis of 
“apparently equal" intervals: that is, on the occurrence that the items 
proposed by the experts are such as to assume equal distance between 
the attitudes of two or several pairs of contiguous categories (eg be- 
tween AB and HJ), thus justifying the coding of the ordered categories 
with integers. 


1. Notes on some quantification methods for ordinal data 11 


Starting from the set of items thus constructed, Thurstone's method 
refers precisely to the (presumed) normal distribution of the only la- 
tent variable underlying above all items: 


given Prob(Xii=1) = pri, the probability that the k-th item causes 
category i to be verified, then the probability that at least one of 
the first categories occurs will be F, (i) = » PK : 

fieS, ji} 


each ordered determination xi; of the item corresponds a value 
Zk = Gi that, defining the quantile associated to the category 
Xxi =1 of the variable Xi; for each k, Gi < G4: (Gi=1,..,D and for 
each distribution Zi, the following identities are verified: 

Si — Hk wen. Gi Bile 8 pe 8 yu. “= T20) 
QM 2*—* |=FR()> 2 =O UR @MI=HC? . 

Se i @) a [F. @]= Ca (c=1.....K) 

If ®(.) is the cumulative function of a standardized normal 
variable, then x and ox are, respectively, the mean and the s.d. 
of the latent normal variable Zx. 


Denote with Ci the observable value (reiterating the observa- 
tions) that cannot be overcome by a normal random variable 
having probability F; (i); now, considering the average of all Cx 
relatively to K, we can set, then aa = (06; —p) =C; foreach 


category 1. 


K Ku ~ 1,& 
If now we set 6 =— i = ayes and G=—> Su, we 
k (Ox k (9K ki 


can obtain values of €; compatible with the “true” (and un- 


known) latent values &, so that it is possible to replace the orig- 
inal ordinal variables with the &; values, so defined on an in- 


terval scale. 


Among the unidimensional scaling techniques, the Monotone Re- 
gression proposed by Kruskal (1964, 1965) is also well known: it is 
considered appropriate to summarize here as its principles are taken up 
in various multidimensional scaling algorithms (for example, the MDS 
technique, contextually proposed by Kruskal itself). 


12 F. Delvecchio, F. D. d’Ovidio 


Having an ordinal dependent variable Y (conventionally repre- 
sented by a set of scores) yi < y2<... < yn, and a transformation func- 
tion f(.) of Y, chosen in such a way as to respect the initial ordering, 
so that these transformations are, orderly, 2) <Z2<...<Zn. 

Now, by setting 


z ;(B) =>igisBs 
s=l 


(where gj; are known numerical values that describe the levels of the 
explanatory factors, and B; some transformation coefficients), and in- 
dicating with z(B) the mean value of the transformation coefficients, 
we can go on, through iterative algorithms, to determine the "direct 
stress" (Kruskal, 1965): 


S(f*;B*) = ming) ming | >"[2; -z,(B)P [X00 ; 


j=l j=l 


which represents a descriptive measure of the goodness of fit of the 
monotonous transformation f(.). 


4.- Quantification based on optimization of an objective function 


This method of quantification is also known as indeterminate quanti- 
fication. 

Sometimes, the aim of the researcher is to verify if the individuals 
of a group give evaluations (about the same feature) whose average is 
significantly more positive than the average evaluation given by an- 
other group. In this case, it is possible (and appropriate) to optimize 
the difference between the averages of the two distributions of the 
quantified feature: such difference, as shown afterwards, can be ex- 
pressed in linear forms under some particular constraints. 

Once the reasons why quantification is carried out by means of an 
objective function are expressed, the method formalizes the conditions 
that the function must satisfy to make solutions exist (i.e. it defines the 
group of the quantifying variables). 

As a matter of fact, sociologists are generally more interested in the 


1. Notes on some quantification methods for ordinal data 13 


values that some statistical surveys can assume (for instance, the dif- 
ference between the averages related to the same feature and detected 
in two or more groups) than in the quantification itself. 

In order to determine one among the innumerable quantifications of 
a feature, it is therefore necessary to fix a criterion to deduce the sta- 
tistical data that are concerned in the research. 


A theoretical dissertation upon the optimization of different kinds 
of objective functions with different kinds of constraints was already 
proposed by Herzel (1974b). Herzel himself (1974a) formulated a 
quantification procedure and an application in the case of comparison 
between average evaluations expressed by two groups of individuals 
about the same feature (linear objective function of the quantified fea- 
ture) with the “o’=1” constraint (quadratic form that is defined as pos- 
itive), thus using linear programming. 

The same approach was also taken afterwards by Delvecchio 
(1984), using exemplifying patterns and a non-vector form in order to 
make its use easier; however, despite the utilization of these patterns, 
calculations are still very complex, and therefore applications of the 
method were very rare. 

Even in the case of maximum or minimum of quadratic objective 
functions with constraints such as linear form of inequality and with 
constraints such as quadratic form of equality that is defined as posi- 
tive, Herzel (1974a) formulated a calculation procedure that the Author 
himself considered as very complicated and laborious, and, therefore, 
very difficult to be put into practice (as a matter of fact, we do not know 
any kind of application about this procedure). In Herzel’s papers, fur- 
thermore, it is not possible to find a clear definition of the rules that 
should be required to make a decision about the acceptance or refusal 
of the hypotheses at the basis of the different objective functions that 
represent the subject matter in hand. 


5.- Specific aim of this book 
In this book, we propose to solve the problem of quantification (when 


it is needed to compare average evaluations, opinions or attitudes ex- 
pressed by the individuals of two or more groups about the same 


14 F. Delvecchio, F. D. d’Ovidio 


argument) by using an approach based on genetic algorithms, provid- 
ing the aforementioned decision rules. 

Because the method needs computing support and, at this time, no 
statistical software has suitable modules or procedures to carry out the 
necessary processing, the appendix of this book show the original 
Matlab codes (rel. 8) expressly written to perform this method, free for 
use (credits G. Delvecchio, University of Bari, Italy). Anyone inter- 
ested will be free to translate them into procedures optimized for use 
in statistical softwares: for example, R. 


Genetic scaling for ordinal variables, ISBN: 978-2-931089-06-4 15 


2 
Determination of the constrained maximums 
and minimums by genetic algorithms to study 
the quantification of ordinal variables 


Giuseppe Delvecchio, Francesco D. d’Ovidio™ 


1. Introduction 


As previously said, in the social researches the quantification of ordinal 
variables is often needed in order to carry out some operations and sta- 
tistical analyses by means of data from interval scales. 

Among the many approaches suggested, the method that optimizes 
an objective function subject to constraints is the least subjective one 
and so it can be repeated, as it gives the same results if we work with 
the same data, with different operators and in different times. Herzel 
(1974a, 1974b) proposed a rigorous procedure to optimize different 
kinds of objective functions with different kinds of constraints. 

Herzel gave a procedure for quantifying the evaluations expressed 
by two groups of individuals about the same feature, optimizing a lin- 
ear form constrained not only by linear functions of inequality but also 
by a quadratic form (defined as positive) of equality. Indicating with 
u; the quantity to associate with the j.th state of the ordinal variable 
and with xj=u;+1-uj20 the distances between these quantities, thus get- 
ting the system 


k 
it — Uy = Da ;x; = Max (min) 
j=l 


> 


x, 20 
o = 
that is 
a'x = Max (min) 
x; 20 ? 
x' Bx =1 


* This chapter was jointly realized by the two authors, but F. d’Ovidio wrote section 
1, while G. Delvecchio wrote sections 2 and 3. 


16 G. Delvecchio, F. D. d’Ovidio 


which is solved through linear programming. 

Herzel (1974a) formulates a very complicated calculation procedure 
(in which the operator must intervene a posteriori for solving the prob- 
lem and, therefore, it is very difficult to be put into practice) even in the 
case of optimization of quadratic function with constraints such as lin- 
ear functions of inequality and with constraints such as quadratic form 
of equality defined as positive: that is, even in the case of the system 


(i, -i2)” fo; = Max (min) 
x;2 0 ? 
oa =1 

Le. 
x'Ax = Max (min) 


The Author chooses letters A and B to specify the matrixes that are 
associated with the two quadratic forms (objective function and equal- 
ity quadratic constraint, respectively), and solves the problem in a very 
sophisticated way: in fact he takes into consideration the characteristic 
equation “|B"'!A - AI|= 0 ”, and the similar equations that are obtained 
by replacing A and B with their minor principal guiding values. Then, 
he chooses among these solutions only those that comply with the “x; 
> 0” and “x'Bx = 1” constraints: «the maximum and the minimum value 
that we are searching for are respectively equal to the biggest and to the 
smallest characteristic root of the associated eigenvectors and whose 
components have the same sign and are determined so that they can 
comply with the constraint condition » (Herzel, 1974a, page 35). 

As anyone can infer, a lot of computational difficulties occur'. 


Delvecchio F. and Delvecchio G. (2004), in order to compare the 
average evaluations expressed by two or more groups of individuals 
about the same qualitative ordinal feature, optimize the differences of 
the partial averages (in the case of two groups) or the variance of the 


' The same Author states that «The procedure is certainly elaborate... Anyway the 
solution of 2*-1 characteristic equations is necessary.» (Herzel, 1974b, page 80). 


2. Determination of the constrained maximums and minimums by genetic algorithms... 17 


partial averages (in the case of more groups) of the quantities associated 
with the ordinal variables, the constraint (so that there are some solu- 
tions) being that the distances x; are not negative and the marginal dis- 
tribution has a unitary variance, thus getting the same systems above 
described. The decision rules are also provided. 

Such analysis is also replicated in the next Chapter of this book. 

The search for the maximum and minimum points of a function is 
traditionally carried out by implementing methods that make use of in- 
formation on the gradient of the same function to guide the search 
course. However, if the derivative of the function cannot be calculated, 
for example because the function is discontinuous, these methods often 
fail. Such methods are generally known as hillclimb, and they are effi- 
cient for unimodal functions, but in case of multimodal functions the 
drawback is that one of the peaks could be “climbed” but it couldn’t 
coincide with the absolute maximum. In fact, this method decides upon 
an initial point, chosen arbitrarily, at first and then it goes on moving 
towards the point to be determined (for example, the maximum value); 
the process ends when this point is reached. For example, if we are 
searching for the maximum of a function (Figure 1) and we assume that 
P is the initial point, the algorithm finds the point B and it doesn’t go 
on: so we reach the peak B and not the peak A. 


100 


80 


60 


40 


20 


—— 


punto iniziale 


-20 
-1 -0,5 0 0,5 1 1,5 2 


Figure 1. Representation of the hillclimb method limits. 


18 G. Delvecchio, F. D. d’Ovidio 


In this Chapter we intend to overcome this problem and find with 
absolute certainty the absolute maximum or minimum point by tackling 
the problem of the optimization of objective functions (both linear and 
quadratic)” subject to some constraints (both linear constraints of ine- 
quality and quadratic constraints of equality) through an approach based 
on genetic algorithms. We will first examine closely the theoretical 
principles which are at the bottom of the method and the methodologi- 
cal choices made, and then we will describe the adjustments due to this 
particular problem and some proposals for making the same algorithm 
applicable to the case of the above-mentioned quantification. Moreover, 
we will implement the rules for deciding whether to accept or not the 
hypotheses on the equality of average evaluations already suggested in 
the paper mentioned above. 

Before outlining the algorithm we will use, we think very useful to 
point out the main differences existing between Genetic Algorithms and 
traditional research methods. 

According to Chipperfield, Fleming and Fonseca (1995, pages 1-5), the 
genetic algorithms: 

1. search for a set of points at the same time, they don’t search for 

a single point at a time; 

2. don’t need further information (for example, the derivative of the 
objective function) besides the same objective function and the 
corresponding fitness (cf. 2.1.c); 
make use of stochastic rules, they don’t use deterministic rules; 
4. work on a transformation of the independent variables (usually a 

binary representation), rather than working on the independent 

variables. 


See 


2. The algorithm 


The genetic algorithms constitute a computational method which is in- 
spired, in its formulation, both by natural selection and genetic varia- 
tion. In fact, as in nature individuals survive if they have a good fitness 


? It is worth noting that we don’t search for the maximum (minimum) of the total 
distribution, which can also be multimodal because it is a combination of several 
distributions, but for the absolute maximum (minimum) of the objective function, 
which exists according to the Weierstrass’ theorem (Herzel, 1974b, p. 56). 


2. Determination of the constrained maximums and minimums by genetic algorithms... 19 


for the environment (natural selection), in the computational model in- 
dividuals (that is, a possible solution to the problem) survive if they 
have a high fitness score; moreover, the computational method provides 
also for the “genetic variation” of the solutions, this genetic variation 
being due to the mutation and/or crossover of the characteristics of their 
parents. 

To carry out a computational model suitable for solving the problem 
of the quantification of the ordinal variables through the optimization 
of the objective function subject to constraints, as explained in the in- 
troduction, we have used the Matlab package. 

After the calculation of a, A and B, the algorithm is made up of the 
following steps. 


2.1 Step 1: initialization 


The process starts by determining: 
- the number M of the elements that will constitute the initial gen- 
eration, 
- the coordinates of these elements, 
- the fitness function by which the best individuals can be chosen, 
- the fitness calculation of the MV elements of the initial population. 


a) Range of the initial generation 

As seen in section 1, the traditional Hillclimb methods search for the 
maximum and minimum values of a function by arbitrarily fixing an 
initial point and then going on towards the solution point to be deter- 
mined, while the genetic algorithms method does the search on a set of 
points simultaneously. 

To do this, we must choose at random the M individuals constituting 
the initial generation (in our case M points among the values of the do- 
main of the function that comply with the constraints) and then take out 
the m<M individuals we want to mate. In our case, the numbers of the 
population are equal to M even in the generations to come because as 
many new individuals come into the population as those who go out of 
it because of a low fitness. 


3 Actually two programs have been carried out, one for optimizing the linear objective 
function, and another one for the quadratic function, but the algorithm (including the 
relative subroutines) is the same for both. 


20 G. Delvecchio, F. D. d’Ovidio 


It doesn’t exist a rule for fixing the numbers / of the initial genera- 
tion, but we must keep in mind that a bad choice can result in too long 
calculation times or convergence problems. In this paper it has experi- 
mentally been ascertained that, if 


M = 100 (k-1) (1) 


(where k is equal to the number of qualitative modes minus one), the 
convergence times of the algorithm improve. 


b) Criterion for determining the coordinates of the elements of the ini- 
tial generation 

Since it is necessary to define the domain in which we work and 
seeing that it is difficult to locate this domain on Cartesian coordinates 
because of the presence of both quadratic constraints of equality (that 
give rise, in S*, to a hyperellipsoid) and linear constraints of inequality 
(that restrict the domain to only one “segment” of this hyperellipsoid), 
expressed by: 


ras , that is x; 20 
o’ =1, 


we have thought of getting round the problem by resorting to polar co- 
ordinates in S* (Ghizzetti, 1952), defined, in our case, by 


x, =r sen, senO, senO,...sen0,_, senO,_, cos0;_, 


Xy =rsenO, senO, senO;...sen0,_3, senO,_, senO,_; 


x3 =r senO, senO, senO;...sen0;,_3 cosO;_> 

x4 =rsenO, senO, senO;...cos0;,_3 

X,_2 =r sen, sen, cos, 

X;,_1 =r sen, cos0, 

x, =r cos, (r>0,050,<n/2 i=1,...,k-) 


In this way, the constraints of inequality x;=0 (which define a bound- 
less region of S") are transformed into the constraints 0<6;< 1/2, with 
the addition of the constraint r>0 taken from the equation of the quad- 
ratic constraint. 


2. Determination of the constrained maximums and minimums by genetic algorithms... 21 


To this scope, for each of the M individuals, we arbitrarily fix a 
(k-1).th of angles (@1, 02, ..., O-1), with O<6;<z/2, and we get r (that is 
the k.th polar coordinate) from the equation of the quadratic constraint 
(so as to impose that the point is on the hyperellipsoid). 

For each individual, the value of the radius r will be got in the fol- 
lowing way. 

Having fixed the values of the angles (61, 02, ..., -1), referring to 
(2), we get the auxiliary vector y=x/r. 

From the constraint of equality we can write 


x'Bx =r*y'By=1, pea Se (3) 


Since B is a positive square matrix, the equation of the constraint of 
equality will be a hyperellipsoid. 

When k=3 we obtain an ellipsoid (cf. Figure 2). 

Besides it is worth noting that since we have imposed 0<6;<z/2 to 
comply with the constraints of inequality, the set of definition of our 
objective function is the “portion” of the hyperellipsoid put in the part 
of hyperspace with positive axes (cf. Figure 3). 

Finally, by means of (2), the point (61, 02, ..., Ax-1, 7) is transformed 
into the point (x1, x2, ..., Xx). 


4-.. 
2 = i SRR, 
cL 
” 04 
-2, 


ee ; 


-5 -4 
Figure 2. Ellipsoid o°=1 in the Herzel example, 1974a (k=3). 


22 


G. Delvecchio, F. D. d’Ovidio 


Figure 3. Segment of ellipsoid 0° =1 when xi20, in the Herzel example, 1974a (k=3). 


Before going on, maybe it is useful to show in which region of the 
space the points have coordinates which represent all the possible quan- 
tifications (of which we want to determine the maximum and minimum 
ones), in the case of the linear and quadratic objective functions, found 
by Herzel in his paper (cf. Figure 4 and Figure 5, respectively). 


x 


¢ 
~ 
Se 


a 
anata, 
zs 2 
Re 
UCIT ay th lr 
TEEN yp oan 
5 Ny gl 
SE ad 
aaa ll iy 
aye Vil 
‘il 


& 


itt 
“i 
Ah 


SS 
hat 
SRO 
“ 
Ra 


ANY 
% 


Figure 4. Linear objective function a'x subject to the constraints o°=1 and xi20, 


in the Herzel example, 1974a (k=3). 


2. Determination of the constrained maximums and minimums by genetic algorithms... 23 


Figure 5. Quadratic objective function x'Ax subject to the constraints o°=1 and xi20, 
in the Herzel example, 1974a (k=3). 


c) Determination of the fitness function 
A specific fitness function must be created for each problem to solve, so 
each point is associated with one fitness score which is assumed to be pro- 
portionate to the fitness of the individual to survive, that is to the fitness 
of the point to represent the solution. 

In our case, it is assumed that the fitness function (fr) is a linear 
transformation of the objective function (f,»)*, that is 


F fis (12 «+9 Xe) =a Sow (Mts «+09 Xe) +R . (4) 


The k; parameter is a scale factor which is used to improve the con- 
vergence and avoid the “early convergence” (in fact, few individuals 
with a high, non-optimum fitness can quickly prevail over the rest of 
the population, thus leading the convergence towards a local maximum) 
or the “slow end” (if the average fitness is high and there is little differ- 
ence with the best individual, it is difficult for the genetic algorithm to 
locate the latter). In our case ki=100 has been used to maximize the 
objective function, ki= -100 to minimize it. 


4 Indeed, during the designing phase of the algorithm, the fitness we have used is the 
objective function, as in Goldberg (1989), but we haven’t got the algorithm conver- 
gence. 


24 G. Delvecchio, F. D. d’Ovidio 


The k2 parameter is a threshold value which is used to avoid nega- 
tive fitness. In our case we have chosen k2=100. 


d) Calculation of the fitness of the initial generation elements 
Replacing (x1, x2, ..., xx) in (4), we get the fitness value of each ele- 
ment of the initial generation. 


2.2 Step 2: choice of the individuals for the reproduction 


Selection is the act of selecting, from a population of M individuals, 
mM whose probability is proportionate to their fitness. These individ- 
uals reproduce and produce young who will be included in the next gen- 
eration. 

The m parents are chosen at random using a pattern that favours the 
best individuals: in fact, as they have been chosen with a probability 
proportionate to their fitness, most of them have a high fitness. The 
good individuals can be chosen several times for the reproduction, 
while the worst ones can never be chosen. 

In this paper, using an idea we have already suggested in another 
paper (Delvecchio, Neri and Sylos Labini, 2002), the individuals are all 
chosen (m=M) and never replicated>. 


2.3 Step 3: transformation of the polar coordinates into binary num- 
bers 


We transform, for each individual, the real numbers constituting the (k- 
1).th of angles (61, 02, ..., &-1), into binary numbers. In particular, after 
the number of bits mpi has been fixed, it is possibile to represent the 
numbers from 0 to 2”*—1 ina binary system, the solution will be 
found by subdividing the range of the angles [0, 2/2] into 2”"— 1 
parts. 

In our case, we fix a length of 30 bits so as to get a solution equal to 


AO = (n/2)/(23°-1) = 0.1463 10° rad. (5) 


> During the designing phase of the algorithm we have also tested a kind of selection 
suggested by Goldberg (1989), in which individuals reproduce proportionally to their 
fitness. Nevertheless, we haven’t got the algorithm convergence. 


2. Determination of the constrained maximums and minimums by genetic algorithms... 25 


2.4 Step 4: creation of the “chromosomes” in the computational 
model 


While in nature the genetic inheritance of an individual is contained in 
the chromosomes which, in turn, are made up of genes situated in par- 
ticular genetic loci, in the computational models the genetic inher- 
itance of a possible solution of the problem, that is all its possible char- 
acteristics, is codified by a string of bits. This string, by analogy, is 
called chromosome, while each individual bit (or small groups of ad- 
jacent bits), called gene, codifies a particular element of the solution 
regarded as being suitable for this purpose. 

The string of bits which characterizes the chromosome, that is the 
solution-individual, is got by connecting the previous binary num- 
bers°. 

For example, for k=3, if the binary representation of the angles is 
§,=010110 and @=110010, the string of bits associated with the in- 
dividual is: 

010110110010. 


2.5 Step 5: crossover 


In an analogous manner to the way a part of the father’s chromosome 
is exchanged, in nature, during the sexual reproduction, with the cor- 
responding part of the mother’s chromosome to originate a young chro- 
mosome, some crossover techniques can, in a computational model, 
allow to exchange parts of the strings between two individuals chosen 
from the current generation to give birth to a new individual. 
Crossover is not generally applied to all the couples of individuals 
selected for the reproduction (that is, it doesn’t mean that reproduction 
is always fertile). The individuals are chosen at random and the chance 
of being subject to crossover is typically between 0.6 and 1.0. Crosso- 
ver is likely to happen if a per cent value, i.e. how many times this 
phenomenon occurs, is imposed and later compared with a randomly 
generated value (roulette value), or if the number of “dead individuals” 
is imposed at each iteration. It must be pointed out that in the latter it 
would be better to break with the analogy with nature and ignore the 


6 During the designing phase different routes have also been followed, by analogy with 
what suggested in other papers (Delvecchio, Neri and Sylos Labini, 2002), but the al- 
gorithm convergence has not been obtained. 


26 G. Delvecchio, F. D. d’Ovidio 


idea of “death from old age” of the individuals belonging to the popu- 
lation. In such a case an individual having a good fitness can generate 
for many iterations or even for the whole algorithm implementation. If 
crossover is not applied, children could be generated simply replicating 
the parents. In this way each individual can reproduce each own genes 
in the next generation without the crossover-induced splitting. 

There are different crossover techniques. The most spread, also be- 
cause it is the simplest, is the single point crossover (Cammarata, 
1994): after having chosen a j index at random (1<j<n), genes are ex- 
changed so as to generate two young strings, so from the two “parents” 
strings 


Ai, Az, ..., An 
By, Bo, ..., Bn, 


two “young” strings are obtained: 


An Ao jes At BG Biss cB 
B, Bo, ..., Bj, B,, Ajxt, Ajs2, vixens 


In this paper the single point crossover has been applied to couples 
of individuals chosen at random without reiteration by imposing (i.e. by 
assuming that the probability of crossover happening is equal to 1) that 
the new generation has the same size as the previous one (Covitti, 
Delvecchio, Neri, Sylos Labini, 2003). 


genitori 1010001110 0011010010 
Nae Nay 

| SS <n 

figli 1010010010 0011001110 


Figure 6. An example of single point crossover in the binary 
case: the cut is made between the fourth and fifth gene. 


2.6 Step 6: mutation 


Since in nature the mutation of a gene is a rare phenomenon, in the 
proposed model only some children will be subject to mutation. At each 
generation, these few individuals will be chosen among all children 


2. Determination of the constrained maximums and minimums by genetic algorithms... 27 


having a low drawing probability. In a practical way, once a 2 number 
is chosen randomly in the interval [0; 1] for each individual, this indi- 
vidual will be selected if 0< 2 < 0.04. 

Once the individual is drawn out, one of the genes (that is, one of 
the bits) to be changed is also chosen at random, by working on the 
string of bits characterizing the individual: after having chosen a j 
(1<j<n) index at random, the A; symbol is replaced with a new Aj sym- 
bol, got at random from the alphabet of the genes (in the binary case, 0 
or 1: see Figure 7). 


before mutation 10101010 


after mutation 10141010 
Figure 7. An example of mutation on the fourth gene in the binary case. 


Hence the transformation (Cammarata, 1994): 
Als vei, Aj, vec, An Ay dse3 Agycsig Ans 


As far as the search speed is concerned, according to the traditional 
theory, crossover is more important than mutation. Mutation involves a 
sort of “random” search and let us make sure that all points have a 
chance to be examined. 

Mutation can therefore be considered as an extra opportunity for an 
individual to develop. Obviously, from a completely dual point of 
view, mutation couldn’t be much suited to the population evolution; 
but, considering that there is a poor probability that this phenomenon 
will happen and, if a genetic algorithm is well implemented, it will get 
rid of the poor-quality gene at the next iteration, it is possible to infer 
that, if worst comes to worst, mutation could result in a slight slow- 
down in the algorithm processing but it wouldn’t de facto endanger its 
convergence. 


2.7 Step 7: transformation of the bit strings connected with children 
into angles 


Chromosomes (i.e. the bit strings) connected with new-generation in- 
dividuals (i.e. the children they give birth to) are transformed into (k- 
1).th angles (01, 02, ..., Ox-1). 


28 G. Delvecchio, F. D. d’Ovidio 


2.8 Step 8: calculation of r for each child 
This is possible through the transformation (3). 


2.9 Step 9: transformation from polar coordinates into Cartesian co- 
ordinates for children 


Having got the value of the radius vector r for each new-born individ- 
ual, it is possible to get the vector x=r y, whose components give the 
Cartesian coordinates of the m points generated, that is those of the chil- 
dren (cf. 2.1.b). 

This transformation is also useful for calculating the children fitness. 


2.10 Step 10: calculation of fitness for each child 
This is possible through the function (4). 


2.11 Step 11: reinsertion 


When a new population has been reproduced, the reinsertion operation 
picks out between the new and old generation some individuals to be 
used in the next iteration. 

In some models the whole generation’ is replaced, in other ones only 
some individuals of the old generation are replaced with others from the 
new generation, according to pre-arranged criteria®. 

Among the individuals of the new and old generation, we chosen 
half of them: that is, those who have the highest fitness values’. 


7 This occurs in nature in the short-life species, such as some insects, in which parents 
lay eggs and die before their young are born. 

8 This occurs in nature in the longer-life species in which parents and children live 
together. This allows the parents to grow up their children and educate them, but they 
also enter into competition. In this case, parents must be selected for sexual repro- 
duction but unlucky individuals must also be chosen so that they die and leave room 
for their children. 

° During the designing phase of the algorithm, the whole old generation has been 
replaced with the new one, as suggested by Goldberg (1989), but the algorithm con- 
vergence has not been obtained. 


2. Determination of the constrained maximums and minimums by genetic algorithms... 29 


2.12 Step 12: stop criterion 


The process is iterated until it is stopped. Thanks to an idea already 
suggested in another paper (Delvecchio, Neri e Sylos Labini, 2002), at 
each iteration, the difference between the maximum and average fitness 
is calculated and when this difference is less than a pre-arranged quan- 
tity (equal to 0.0001) the process is stopped'®. 

Besides, the algorithm stops (even if it doesn’t comply with the stop 
criterion) if it reaches the maximum number of generations imposed by 
the operator. 


Population 
T T T T T 
Best subject ~~~! coecdesseess | 
: * ° 
ae 298° | 
: | C}! 6 
%, : Oo é 
Q 822, © 010500, 19,5 o 8 
dg $3 Be 2 Biol i $ oo coy o® 
06 08 1 12 14 16 
a, [rad] 
Figure 8. Population at the 2™ iteration in the example given by Herzel, 1974a 


(k=3). 


10 During the designing phase of the algorithm, as stop criterion, we used the differ- 
ence between the present maximum value and the maximum value obtained in the 
previous five iterations thus comparing it with the present maximum value, as already 
made by Goldberg (1989). But the algorithm convergence was not obtained. 


30 G. Delvecchio, F. D. d’Ovidio 


Population 


|| Coptic tse Gp a << lees Gi bso CEES OB OCS- EE NEED ELC > 
0 0.2 04 06 08 1 12 14 16 
0, [rad] 


Figure 9. Population at the 12" iteration in the example given by da Herzel, 
1974a (k=3). 


Population 


Obe Lo ioe owe eqns: 
0 04 ! 0 ; ; 16 


Figure 10. Population at the 22™ iteration in the example given by Herzel, 
1974a (k=3). 


2. Determination of the constrained maximums and minimums by genetic algorithms... 31 


70 


85 


Generation 


Figure 11. Stop criterion: difference between the maximum and average fit- 
ness in the example given by Herzel, 1974a. 


2.13 Flow chart 


The flow chart of Figure 12 summarizes the information given above. 
Apart from the absolute frequencies of the distributions, we have 
also to input onto the computer program: 
e the numbers of the M population, according to (1); 
e the number of bits (cf. 2.3), on which depends the solution of (5) 
required; 
the crossover probability (cf. 2.5); 
the mutation probability (cf. 2.6); 
ki, ko and so the fitness function (cf. 2.1.c); 
the limiting value used in the stop criterion (cf. 2.12); 
the maximum number of generations beyond which the algorithm 
stops even if it hasn’t reached its convergence (cf. 2.12). 


32 G. Delvecchio, F. D. d’Ovidio 


a (A) and B calculation 


Step 1 — initialization: the coordinates of the M points 
of the initial population and their relative fitness are determined 


Step 2 - selection 
Step 3 — @ transformation into binary numbers 


Step 4 - creation of the strings representing 
the chromosomes: for ex. 010110110010 


Step 5 - crossover 


Step 6 - mutation 


Step 7 — transformation of the strings of bits associated with children in 6, 


Step 8 — calculation of r for each child 


Step 9 — transformation of polar coordinates into 
Cartesian coordinates: (0, A), ...5 Og 4s 1) DO (Xys Xo 000s Xp) 


Step 10 - calculation of the children fitness 


Step 11 - reinsertion 


Figure 12. Flow chart of the genetic algorithm implemented. 


2. Determination of the constrained maximums and minimums by genetic algorithms... 33 


Input: Up Ni (i=1,2 j=l, ..., k+1) 


v 


Cucconi-Kolmogorov test 
for verifying the sub-population normality 


a 


Normal sub-populations 


Non-normal s' 


ub-populations 


Nn, N>>100 


Snedecor’s F-test 
for verifying the 
homoschedastic distribution 


Heterosch. 
distribution 


Homosch. 
distribution 


Student’s 
T-test 


Welch-Aspin’s 
v-test 


Ny, NyS100 


Triad test for verifying the 
sub-population symmetry 


Symmetrical 
sub-populations 


0° |=07, 

and equal 
functional form 
hypothesized? 


y 


Asymmetrical 
sub-populations 


U 


Fligner-Policello 


Mann-Whitney 


-test U-test 


nj.n2 


There’s 
nothing to say 


Figure 13. Flow chart of the algorithm implemented to compare the average 
evaluations expressed by the individuals of two groups about the same feature. 


Moreover, Figure 13 shows the flow chart of the algorithm imple- 
mented to compare the average evaluations expressed by the individu- 
als of two groups about the same feature, obtained through the u; 


34 G. Delvecchio, F. D. d’Ovidio 


calculated, while Figure 14 shows the flow chart of the algorithm im- 
plemented to compare the average evaluations expressed by the indi- 
viduals of many different groups about the same feature. 


Input: Uj, Nj 


Shapiro-Wilk or Cucconi-Kolmogorov test to 
verify the normality of the sub-populations 


Normal Non-normal 
sub-populations sub-populations 


Bartlett test 
for verifying the 
homoschedastic distribution 


— 


Heteroschedastic 
distribution 


Homoschedastic 
distribution 


Welch test 
for c samples 


Kruskal-Wallis 
K-test 


Variance 
analysis 


Multiple 


Yes ; 
comparison test 


Figure 14. Flow chart of the algorithm implemented to compare the average 
evaluations expressed by the individuals of many different groups about the 
same feature. 


2. Determination of the constrained maximums and minimums by genetic algorithms... 35 


3. Conclusions 


It seems now appropriate to point out that the models based on genetic 
algorithms can be used in many different situations just as there are dif- 
ferent biological models in nature. But they cannot be applied sic et 
simpliciter to any problem since it is necessary to find out the biological 
model that does well fit for the case under study. 

As for the quantification of the changeable ordinal values, for in- 
stance, it has been necessary to fix for the algorithm convergence the 
following: the range of the starting population, the arrangement of the 
elements of this population (by a transformation into polar coordinates 
in S*, in order to limit the range of definition of the objective function 
and its constraints), the kind of crossover, the reinsertion criterion, etc. 

The model used has been therefore made ad hoc and so it is unfit to 
solve any kind of optimization problems. 

As for the goodness of the method suggested, the results we have 
obtained in the next Chapter of this book are not only more than satis- 
factory, but they are also confirmed by the comparison with the data 
given by Herzel in his paper (Herzel, 1974a): the results of our method 
are slightly better than those of Herzel since we have obtained a maxi- 
mum of the objective function which is slightly higher in the maximiz- 
ing distribution and, similarly, a minimum which is slightly smaller in 
the minimizing distribution. 

Moreover, as pointed out in the introduction, our method is less ar- 
tificial than that of Herzel in determining the solution because it doesn’t 
require any intervention a posteriori and so it can be used more easily. 

To make its application easy it has also been implemented an algo- 
rithm that, referring to more suitable statistical tests, allows to decide 
upon the equality of the average evaluations expressed by two or more 
than two groups of people. 


Genetic scaling for ordinal variables, ISBN: 978-2-931089-06-4 37 


3 
The Genetic Scaling as a tool for the comparison 
of averages of data measured in ordinal scale 


Francesco Delvecchio, Giuseppe Delvecchio ~ 


1. Basic observations 


Let u, Su, <... < u,,, be quantitative method that can be associated to 
the k+1 qualitative method. In order to decrease the number of the meth- 
ods that must be quantified, we will use the distances between consec- 
utive methods: 


x= a Ui — 


uj; 20 G = 1, 2,..., k). 
By varying the index, we will have: 


—Uu 


in =X 


nto ceeeee Up TU = Xp Ug, ~ Up = Xy- 

If we add member to member and then simplify them, we will ob- 
tain: 

k 
Uj Ups = th : 
h=j 

Excluding the majority of cases, we can consider only two groups 
and use fo; fi; fo; and Fo;, F\;, F>; to specify, respectively, the relative 
frequencies and the cumulative relative frequencies of total distribution 
and partial averages. 

Let us calculate, now, the “wu ” average for the total distribution: 


k+1 
d= > uj fo; =Yentonn ES ae Shey =U ~ > > xfo, = Uy + 
ial 


j=l jal h=j 


te ee eee ae eer? 


k 
= Ug} —[x for + Xq(for + for) + +--+ %% (Sor + for +...+ fox] = Unt ->)x;Fo; . 
jal 


“ This chapter was jointly realized by the two authors, but F. Delvecchio has edited 
sections 1, 2 and 9, while G. Delvecchio has edited sections 3, 4, 5, 6, 7 and 8. 


38 F. Delvecchio, G. Delvecchio 


Let us calculate in the same way the partial averages even in the case 
of a number of “c>2” distributions: 


k k 
y= Uys — DXA j > Wy = Ug — Dx jPaj 
jal jal 


co UT Dar gj 


Let us calculate, now, the variance of the partial averages: 
2 


(a c k 
YG -a) fo=> Ua Dak ij = Ups + > %)Fo; fio = 


i=l i=l j=l 


= Page Dak Tio = 


io) 


2 


l k k ok 
= bay Fo; [ki eee fio = 
j=l 


jelh=l 


a 


=») Sy, et eye Xp Fi Fin Bye Xp o Fin | Sio = 


i=t| j=l h=l j=lh=l jalh=l 


25 amy | 3 Fl foF tn- yds rites Fa fo = 


jelh=l j=lh=l j=l h= 


k ok 
=> x)x4FojFon Sy (3 a i F-29030 


j=l h=l jlhe=l jelh=l 


where, as it is easy to verify, we have: 


ys intio = Fon: 


So, after all, the variance of the partial averages becomes: 
SG fo= D> 4; oi 3 FoF in FyFin |= As (1) 
i=l jal hal 


considering x = [x1, ..., x¢]' and A as the positive semi-definite symmet- 
rical matrix whose generic element is given by: 


3. The Genetic Scaling as a tool for the comparison of averages of data... 39 


ain = Cih 


—FyjFo, with cj = > foFjFn FAI, ..., 8. (2) 
i=l 
Similarly, since: 


k+l 


pain. k+1 aba Dos Mi Dato; Sir 4 > 


k 
So eneG ty +... 4X4) for + (Xq +--+ Xe) Son +++ XeSon = > %/Fo;, 
j=l 


jalh=j 


the variance of the total distribution is equal to: 
2 


kal kl i 
Os dn ~it)” hoj= aM joj -# ae) ee 3 Joj our Le fo = 


jel hej 


2 


=tafona Hea fy eS hy ee foj- Wea + 


jalh=j j=l| h=j 


ama Fo; Bye Xn Fo Fon sy a Bie sr Xp Fo Foy * 


j=lh=l j=l) h=j j=lh=l 


If we also consider that: 


aie 
DS ys — Yim yee. 
=u, 


j=l r=l s=r+1 
k ok ; k-l ok 
>, > ~;%n¥ojFon = ys Fy, +2) >) x,x5Fo,Fos » 
j=l h=l j=l r=l s=r+l 
then the total variance becomes: 
=a Fy Fy) +25" s x,x, Fo, (1— Fo,) = x'Bx (3) 
r=l s=r+l 


considering x = [x;, ..., x«]' and B= [bj] as a positive definite sym- 
metrical matrix with: 


[bin] = [Fo(1— Fon)] (G<h=1,...,1), bijp= din. (4) 


40 F. Delvecchio, G. Delvecchio 


2. Comparison among the average evaluations expressed by the 
individuals of two different groups about the same feature 


When the aim of the research is to establish if the individuals of a group 
express on average evaluations or attitudes about the same stimulating 
phenomenon that eventually turn out to be more positive than those 
coming from another group, it is possible to assume the difference be- 
tween the averages of the two distributions related to the quantified fea- 
ture as objective function: 


k k 
i — iy = (Fj - Fj) x; =) ajx;=a'x (5) 
j=l j=l 

In fact, it is evident that if, for instance, this difference is positive for 
each quantification, then it means that the individuals of the first group 
express on average evaluations that are certainly more positive than 
those expressed by the second group. Besides, it is obvious that (5) em- 
phasizes the fact that if aj= F2-F1;>0 for each j <k, then Gj) > U2 in any 
quantification, whereas if a;<0O for each j, then i < U2 in any quantifi- 
cation. 

So the problem rises when the a; do not have the same sign. In fact, in 
that case there is at least one quantification that makes the difference be- 
tween the averages have the same sign as _a;: for instance, it is sufficient 
to set xj>0 and x,=0 foreach h#j. This represents a further confir- 
mation of the fact that the directly-determined quantification method 
cannot be proposed. 

If this happens, it is necessary to give a quantification criterion that 
allows to establish what are the conditions under which the individuals 
of the first group express on average evaluations that are more or less 
positive than those expressed by the second group. 

Concerning this, a useful criterion could be to bring back the com- 
parison between these evaluations to the comparison between the aver- 
ages of the maximizing and minimizing distributions of the 
Uy — Uy = ya pep objective function, presupposing as constraint that 


x;2 0 and that there should be a constant variability index, considering, 
for instance, the total variance as unitary (indeed, if you set o?=d’, 
then all the x; that you obtain would be multiplied by d): it is necessary 
to set this last constraint so that the problem can have a solution. 


3. The Genetic Scaling as a tool for the comparison of averages of data... 41 


Besides, it is obvious that fixing these constraints implies that the 
solution point — individuated by k coordinates — is not free to move in 
the S“ space but is “bound” to stay on a segment of hyper-surface that 
is defined by the equation of the constraint (for instance, if o? is a 
quadratic form defined as positive, o?=1 is equivalent toa S* hyper- 
ellipsoid) and by x20. 

So the x; values are obtained by optimizing the mixed system: 


k 
Uy —Uy = wax; = Max (min) 
jel 


Xx; =0 
o* =1 
that is (6) 


a'x = Max (min) 


After calculating the x;, it is possible to obtain the u; without an ar- 
bitrary constant (that is why this method is also called “indeterminate 
quantification”), that is it is possible to find the quantitative methods of 
the maximizing and minimizing distributions of the objective function. 
So we use i) and @} to indicate the averages of the two groups that 
are associated to the maximizing quantification, while 0; and i) indi- 
cate the averages of the two groups that are associated to the minimizing 
quantification: if the interval described by [a — i) , 0) — 05] has posi- 
tive bounds, then 0; >t will be true in any quantification; but if both 
the bounds of this interval are negative, then 0 <t2 will be true in all 
the quantifications. 

Obviously, if we would change constraint (that is, if we would as- 
sume, for example, that the standard deviation of the marginal distribution 
should be constant instead of being o°=1), then we would obtain different 
solutions, but the 0; — U2 difference sign for the maximizing and minimiz- 
ing distributions would anyway be always the same (Herzel 1974a, p.11). 

When the [i — 0) , 0; — 05] interval contains 0, it is possible to 
compare the averages of the two groups — that were obtained for the 
maximizing and minimizing distribution of the objective function — with 
the appropriate fests that are usually used to compare averages: for 


42 F. Delvecchio, G. Delvecchio 


example, the Welch-Aspin test or the Z test can be used in case of big 

samples. 

So, when the above-mentioned interval contains 0, the following cir- 
cumstances can occur: 

1. both i}—t and i —7@% are not significant: this means that the 
two groups do not express significantly different evaluations; 

2. d1—wt) is not significant, whereas i'}— i is significant: since 
t'| — i) > 0, this means that the individuals of the first group do ex- 
press on average evaluations that are more positive than those ex- 
pressed by the second group; 

3. ti-i) is significant, whereas i'|— 10 is not significant: since 
ti — 0 < 0, this means that the individuals of the second group do 
express on average evaluations that are more positive than those 
expressed by the first group; 

4. both t|— i and i) — a) are significant: since the two differences 
between the averages are significantly different, it is necessary to 
fix a criterion that can help establish which is the group that ex- 
presses the most positive evaluations: this criterion could consist in 
believing that 0; >U2 is more probable if the width of the positive 
semi-interval is significantly bigger (for example, at least the dou- 
ble) than the width of the negative semi-interval; vice versa U1 < U2; 
nothing can be said if the aforementioned semi-intervals have ap- 
proximately the same size. 


3. Comparison among the average evaluations expressed by the in- 
dividuals of many different groups about the same feature 


When the distributions of the same feature are more than two, it is es- 
sential — even in this case — to make reference to the maximizing and 
minimizing distributions of the objective function in order to test the 
equality hypotheses between many average evaluations: so, first of all, 
it is necessary to find the objective function that allows to compare this 
kind of average evaluations. 

Besides, it is evident that this function must be able to statistically 
assess the difference between the average evaluations of the different 
groups: therefore, an appropriate function can be given by a variability 
index of the averages (indeed, if this index is, for example, significantly 


3. The Genetic Scaling as a tool for the comparison of averages of data... 43 


equal to 0, then the averages can be considered as significantly equiva- 
lent): therefore, the most simple objective optimizing function is obvi- 
ously the variance (1) of the averages of the groups. 

So the problem consists in finding the quantification that maximizes 
(or minimizes) the variance of the partial averages with the usual con- 
straint to get a solution, so that the variance of the marginal distribution 
is equal to 1 and, therefore, optimizes the system: 


> - i)” fo; = Max (min) 


x, 20 
o =1 
that is (7) 
x'Ax = Max (min) 
x, 20 
x'Bx =1 


Once determined the maximizing and minimizing quantifications of 
the objective function, it is possible to apply the methods that test the 
homogeneity of the averages of the distributions that are associated to 
them: for instance, when the sample is very numerous, you can use the 
Welch test that is suitable in case of big samples. 

Obviously, the minimum value of the variance between different 
groups is 0 when the averages of the various groups are all equivalent 
to one each other; but since the solution point is bound to belong to the 
x'Bx = 1 hyper-ellipsoid, this minimum value could also be not null. 

Even this time there are various possibilities: 

1. the averages of the maximizing distributions are not significantly 
different among themselves: in this case even the averages of the 
distributions that are associated to any other quantification are not 
significantly different among themselves, because those distribu- 
tions have a lower variance value among their averages; 

2. the averages of the distributions that are associated to the mini- 
mizing quantification are significantly different among them- 
selves: in this case even the averages of the distributions that are 
associated to any other quantification are significantly different 


44 F. Delvecchio, G. Delvecchio 


among themselves as well, because those distributions have a 
higher variance value among their averages; 

3. the averages of the minimizing distributions are not significantly 
different among themselves, whereas the averages of the maxim- 
izing distributions are significantly different: in this case only the 
maximizing distributions will be studied; 

4. the averages of the groups are significantly different both in the 
minimizing distributions and in the maximizing distributions. 

After determining the quantifications (both the minimizing and the 

maximizing one, or even just one of them) in which the averages are 
significantly different, it is necessary to compare the averages of each 
single quantification by using the multiple comparison method in order 
to verify if all the averages are really different among themselves (obvi- 
ously, both the ones that are associated to the maximizing quantification 
and those that are associated to the minimizing one), or only some of 
them (even just one of them). 

After identifying the different averages for both quantifications (or 

even for just one of them), it is possible to compare the differences with 
the procedure that is provided for two samples. 


4. An outline on genetic algorithms and the procedure that has 
been adopted in this paper 


Genetic algorithms are a computational model and are inspired by 
Darwin’s Theory of Evolution that was based on two essential princi- 
ples: the principle of genetic variation and the principle of natural se- 
lection (Cammarata, 1994). 

In nature, couples of individuals join up at each generation to repro- 
duce other individuals whose genetic inheritance will be the result of 
the combination of their parents’ genetic inheritances. The survival of 
each single individual is linked to its ability to fit itself to the surround- 
ing environment (that is the so-called fitness ability), and this fitness 
depends on the individuals’ genetic inheritance: as a matter of fact, 
those who have a better fitness are on average more favored than all the 
other ones (that is the so-called natural selection). 

Therefore, it is evident that if it is possible to transfer the individu- 
als’ best characteristics, then it will be also possible to get a better off- 
spring that will fit itself more easily to the surrounding environment. 


3. The Genetic Scaling as a tool for the comparison of averages of data... 45 


Even in the case of genetic algorithms the survival of an “individual” 
(i.e. a possible solution to the problem) is linked to its fitness: so it is 
necessary to fix a fitness score for each single individual and this score 
will depend on the particular problem that must be solved. In our case, 
we will assume a linear function of the objective function as fitness (that 
is, as fitness function). That is: 


F pit (41s «06s Xe) = Sow (M2 0 Xe) +e (8) 


where k; is a scale factor that is used to improve the convergence (in 
our case we have decided to use the kj=+100 equivalence depending on 
what is needed, that is a maximization or a minimization of the objec- 
tive function), whereas kz is a threshold value that is used to avoid any 
negative fitness (in our case we have decided to use the following 
equivalence: k2=100). 

In nature, an individual’s intrinsic genetic inheritance is contained in 
its chromosomes, that are at their turn composed of genes that have a 
specific position inside of them (i.e. the so-called Jocus genicus). Even 
in the case of computational models, an individual’s genetic inheritance 
(that is the set of all the intrinsic characteristics of a possible solution) is 
contained in its chromosomes: each of them represents one of the indi- 
vidual’s particular features that are usually coded by a string of bits; on 
the contrary, the genes it is composed of can be both single bits and small 
blocks of adjacent bits; all these bits codify a particular individual. 

The most simple models are obviously those that have only one 
chromosome (that is the so-called haploid individuals) that character- 
izes the individual by itself. In our case we have decided to use such a 
simple model. 

During the process of sexual reproduction a part of a father’s chro- 
mosome exchanges its place with the corresponding part of the 
mother’s chromosome (this is the so-called crossover process) giving 
birth in this way to a new genetic inheritance (i.e. the child). Similarly, 
as for computational models, we have studied some techniques that al- 
low to exchange the place of the genes (i.e. the blocks of bits) of two 
chromosomes (i.e. of two strings of bits), giving birth in this way to a 
child chromosome. 

Since in nature a gene can undergo mutation during the passage from 
the parent to the child, we have obviously tried to realize this in the case 


46 F. Delvecchio, G. Delvecchio 


of computational models as well, and to this purpose we have studied 
some techniques that allow us to replace the bit of a Jocus that is chosen 
at random, giving birth in this way to mutated genes, i.e. to a new ele- 
ment of the population we reproduce. 

In order to solve our problem by means of a genetic algorithm, we 
make a random selection of m points (i.e. the individuals of the starting 
population) among the values of the function range of existence that 
comply with the constraints, and then we assess the fitness of each m 
point. Since generally the fitness of all the starting individuals is not high 
(because they were chosen at random), we will try to make the popula- 
tion evolve so that at least one acceptable solution — having high fitness 
— will rise after a reasonable number of generations. To get this evolu- 
tion, we will obviously make some operations inspired by the genetics 
on one or more individuals that are chosen according to the probabilistic 
criteria fixed by the operator (Chipperfield, Fleming, Fonseca, 1995). 

After explaining the general principles on which genetic algorithms 
are based, let us show now the procedure that we have used to solve the 
problem about the quantification of the ordinal variables. 

Since a too high or too low m value can imply very long calculation 
times or convergence problems, we have noticed that in our case it is 
sufficient to assume that m=100 (k-1). 

Besides, since it is very complicated to define within the hyper- 
spaces the variation fields of the x; that comply with the constraints, we 
have decided to exploit the passage to the polar coordinates: as a matter 
of fact, thanks to these coordinates the independent variables are k-1 
angles in [0, 2/2] for each single individual, whereas the r radius vector 
is calculated from the equality constraint equation. 

All these things considered, first of all it is necessary to calculate a, 
A and B, and then it is possible to go on as follows: 

1. start arbitrarily fixing a (k-1).th of angles for each m individual: 

(O, 2, ..., 1), with O<6<7n/2 for i=1,2,..., k-1; 

2. turn the angles 6; into binary numbers for each individual; 

3. associate only one chromosome (i.e. a string of bits) to each in- 
dividual by connecting the previous binary numbers; 

4. apply the crossover to couples of individuals that have been cho- 
sen at random without repetition, establishing at the same time that 
the new generation should have a numerousness equal to the pre- 
vious one and assuming 1 as crossover probability, i.e. that each 


3. The Genetic Scaling as a tool for the comparison of averages of data... 47 


copulation should reproduce a couple of children (Covitti, Delvec- 

chio, Neri, Sylos Labini, 2003); 

5. carry out the mutation of a low probability gene on the individu- 
als of this new generation (we have decided to assume a proba- 
bility lower than 0.04); 

turn the binary numbers associated to the children into angles 6;; 
calculate the r radius value for each individual by using the an- 
gles values and by applying the constraint equation; 

8. calculate the k.th (11, ..., x) from the k.th (01, ..., @-1, r) for each 
individual by turning the polar coordinates into S* Cartesian co- 
ordinates (Ghizzetti, 1952); 

9. calculate the fitness score for each individual; 

10.put the individuals of the new generation and those of the old one 
in order according to their fitness values; then, choose half of 
them (it is the so-called process of reinsertion): that is, those who 
have the highest fitness values; 

11.this procedure must be iterated until the stop criterion is reached: 
in our case, it consists in calculating, at each iteration, the differ- 
ence between the maximum fitness and the average fitness 
(Delvecchio, Neri, Sylos Labini, 2002), and in stopping when 
this difference is lower than a certain prefixed quantity (in our 
case it is 10%). 

For further analyses you can consult the previous chapter of this same 

book. 


So} 


5. Rounding off the solutions 


Since the calculation method gives approximate solutions, it is neces- 
sary to fix a general criterion to round off the x; values. 

To this purpose, let us calculate the maximum variation that the x; 
can undergo so that the objective function should not be bigger than an 
arbitrarily small ¢>O quantity. Since it is well-known that the total dif- 
ferential of a function is just an estimate of the infinitesimal variation 
of the function that is in correspondence with an infinitesimal variation 
of the independent variables, let us write the total differential of the two 
objective functions (i.e. the linear function and the quadratic one) that 
we have studied so far. If we fix that Ax=Max(|Axj|), then we will have 
(Delvecchio, 2010, p. 89): 


48 F. Delvecchio, G. Delvecchio 


Ca'x k k 
=| 2 ax] —fa Ax|s Ya, ||ax,]< asd) (9) 
j=l j=l 
ax'A k | k k | k 
lay = A al =fou'd sls 20, asi] 2a ays; - (10) 
i=l | j= i=l | j= 


where the absolute values inside the summations allow to take into con- 
sideration the hypothesis of the “worst case” in the assessment of the 
|dy| error (Savino, 1992): as a matter of fact, since all the terms we sum 
are positive (no matter what sign do the components of the original vec- 
tor have), we reduce in this way the Ax value and, as a consequence, the 
value of the objective function assessment error as well. 

Therefore, assuming the Ax value so that the quantification that we 
have obtained with the rounded off x; variables give an error of the 
|dy|<e objective function, and considering (9) and (10), we will have: 


Ax<e/D aj] and Ar<e/(2)[Yajx)) tia b.k. CD 


So we will consider those x;<Ax values as equal to zero and, as a 
consequence, Uj+1=U; . 

The programme that we have realized to determine the needed quan- 
tification also provides for the implementation of the statistical signifi- 
cance tests in the most common cases of problems of comparison 
among averages. 

The method will be validated in the following sections by applying 
it to two real cases. 


6. Comparison among the opinions expressed by graduates 
(grouped according to their current job conditions) about the 
effectiveness of the university education they received in terms 
of job exploitability 


A telephone survey involving graduates that obtained their degrees dur- 
ing five years was conducted three years after the last observed degree, 
in order to investigate about the graduates’ placement; people involved 
in this research was randomly chosen from the University graduates’ 
list and were stratified according to their degree course. 


3. The Genetic Scaling as a tool for the comparison of averages of data... 49 


Table 1. Distribution of the evaluations expressed by a sample of university 
graduates (grouped according to their current job conditions) about the effec- 
tiveness of the university education they received in terms of job exploitability. 
Scheme for computing qj. 


Frequencies according to the current job condition 


Moi 


Unem- 
ployed 
people 
0.04297 |0.05906|0.05098} 0.01609 
0.27734 |0.2598410.26863| —0.01750 
0.66797 |0.67323}0.67059] 0.00526 
0.91016 |0.88976|0.90000} —0.02039 
1.00000 |1.00000] 1.00000} 0.00000 


So far 510 interviews carried out at the Faculties of Economics, Law, 
Humanities and Mathematic, Physical and Natural Sciences. 

Table 1 contains the distribution of the evaluations that the 510 in- 
terviewed people have expressed about the effectiveness of the univer- 
sity education they received in terms of job exploitability (the graduates 
have been divided according to their current job conditions); it also con- 
tains the a; elements. 

Particularly, given their different work experiences, the table is in- 
tended to verify if the current job condition has an influence on the 
graduates’ evaluations. 

Since the a;do not have the same sign, it is not possible to say if the 
employed graduates express evaluations that turn out to be more posi- 
tive or less positive than the unemployed graduates’ ones. 

Therefore, let us now consider the analysis using the test of the con- 
ditioned maximums and minimums of the objective function 
ity iy = a jx; =a'x, 

In order to optimize system (6), after calculating the symmetrical 
matrix by using (4): 


0.04838 0.03729 0.01679 0.00510 
_ | 0.03729 0.19647 0.08849 0.02686 


~ |0.01679 0.08849 0.22090 0.06706 
0.00510 0.02686 0.06706 0.09000 


50 F. Delvecchio, G. Delvecchio 


and then fixing <=10~, since, because of (11): 
Ax < é/S"|a;|= 10°10.05924=0.00169 

the solution of system (6) that maximizes the objective function is: 
x1=4.54629, 3 x=0, x3=0, x4=0. 


Therefore, apart from an arbitrary C constant, the quantification that 
must be attributed to the methods of the feature that maximizes the dif- 
ference between the averages of the two groups is: 


uw=C , U2 = U3 = U4 = Us = C+4.54629 . 


For example, assuming C=0, the observed variables and the s. v. that 
are associated to them and that maximize the objective function are the 
ones contained in Table 2. 


Table 2. Maximizing Distribution. 


Frequencies 
Evaluations U Nij N24 
(Employed people) | (Unemployed people) 
Insufficient 0 11 15 
Other 4.54629 245 239 
Total 256 254 


From this table we can obtain: 


i) = 4.35098, &=4.27785, a1 —ia3= 0.07313, 
(s;)" = 0.8532864, (s5)" = 1.1530579. 


The solution of system (6) that minimizes the objective function is: 
X1=0, = X2=0.82956, 9=x3=0, =-x4= 2.86209 , 
and thus: 
W=W=C, uw3=uW4= C + 0.82956, us= C+ 3.69165. 


For example, assuming again C=0, the statistical variables and the 
statistical variables that are associated to them and that minimize the 
objective function are the ones described in Table 3. 


3. The Genetic Scaling as a tool for the comparison of averages of data... 51 


Table 3. Minimizing Distribution. 


Frequencies 
Evaluations U Nj Ni 
(Employed people) |(Unemployed people) 
Insufficient and Poor 0 71 66 
Henly e0cd.and |lh@:a9056 162 160 
Good 
Excellent 3.69165 23 28 


From these distributions we obtain: 


ii = 0.85663, i)= 0.92951, ai— a) =—0.07288, 
(s7)' = 0.9296921, (s3)' = 1.0760330. 


Since the [0) — 0); 0) — 02] = [-0.07288; 0.07313] interval contains 
O, first of all it is necessary to test if the differences between the aver- 
ages of the two groups are significant — the differences are obviously 
related to the maximizing distribution and to the minimizing one. 

To do this, since big samples are involved, we will use the following 
check test 


i= , 12 
(SP / 91) +(S3/ 82) “ 


assuming gi=”)-1 and g2=n-1 that, as it is well-known, if the basic 
hypothesis is true, is distributed approximately as M(0,1). 

So, fixing a«=0.05 and denoting z' and z' the values that are 
assumed by the previous test for the two distributions (i.e. the maxim- 
izing distribution and the minimizing one), respectively, we obtain: 


zZ =— 0.822 > — 20.05 = —1.645 z" = 0.826 < zo05 = 1.645. 


We can conclude, therefore, that (rule No. 1, from section 3), among 
the graduates that attended those Faculties and obtained their own de- 
gree in those years, most likely both the employed people and the un- 
employed ones do not express different evaluations about the effective- 
ness of the university education they received in terms of job exploita- 
bility. 


52 F. Delvecchio, G. Delvecchio 


7. Comparison among the opinions expressed by the graduates of 
different Faculties about the effectiveness of the university 
education they received in terms of job exploitability 


As for the same previously-mentioned research on the university grad- 
uates’ placement, Table 4 contains the evaluations given by a sample 
of graduates that attended different Faculties: the opinions they ex- 
pressed are about the effectiveness of the university education they re- 
ceived in terms of job exploitability. 


Table 4. Distribution of the evaluations expressed by the graduates of four 
different university Faculties about the effectiveness of the university educa- 
tion they received in terms of job exploitability — Vectors: ain=Fj-F n. 


Faculties ain=PFi— Fy, 1>h=1,2,3 
E}|H|/S/L] aa a3i aa a32 aa2 a3 
Insufficient 4| 7) 6} 9} 0.007) 0.01} 0.024) 0.003} 0.018} 0.014 
Mediocre 33] 28} 18] 32} -0.137| -0.17| -0.077) -0.033} 0.060) 0.093 
Fairly good | 40] 59] 60) 46] -0.143] -0.07| -0.149| 0.073} -0.005] -0.079 
Good 20) 29) 24) 44} -0.150) -0.07] -0.034} 0.080) 0.116] 0.036 
Excellent 3) 27) 12) 9 


Legenda: E=Economics; H=Humanities; S=Mathematic, Physical and Natural Sciences; L=Law 


Evaluations 


We want to verify if the evaluations expressed by the graduates of 
the four different Faculties are on average significantly different. 

Table 4 also contains the aj,=F';— F;, vectors: we have chosen F; and 
F, to indicate the vectors of the cumulative relative frequencies for the 
i.th and h.th Faculties, respectively. 

As the Table shows, the values of the components are not concord- 
ant for each ain vector; this implies that it is not possible to immediately 
say if the graduates coming from some of the Faculties express evalu- 
ations that turn out to be more positive than those expressed by the 
graduates coming from other Faculties: therefore, it is necessary to 
solve the problem by optimizing the objective function called “vari- 
ance of the partial averages”. 

After calculating the A and B matrixes by using (2) and (4), respec- 
tively: 


3. The Genetic Scaling as a tool for the comparison of averages of data... 33. 


0.0001 —0.0001 -—0.0003 0.0001] 
—0.0001 0.0037 0.0016 0.0025 


A= 
~0.0003 0.0016 0.0034 0.0019 
0.0001 0.0025 0.0019 0.0032| 
0.04838 0.03729 0.01679 0.00510] 

pg _| 0.03729 0.19647 0.08849 0.02686 


0.01679 0.08849 0.22090 0.06706 
0.00510 0.02686 0.06706 0.09000] 


and then fixing <=10", since because of (11) 
Ax < e/(2Q)]¥a;x,|) =10°70.0555=0.0018, 
the solution of the system (7) that maximizes the variance of the partial 
averages is: 
m4=0, 2x2=0.9002, x3=0, x4 = 2.7996. 


Therefore, apart from an arbitrary C constant, the quantification that 
maximizes the variance among these averages is: 


W= W2= C, u3= u4 = C+0.9002, us=C+3.6998. 


So, for example, assuming that C=0, the maximizing distributions are 
contained in Table 5. 


Table 5. Maximizing Distributions. 


. Frequencies per Faculty 
Evaluations U 
ni nN2j N13; N4j 
Insufficient and Mediocre 0) 37 35 24 41 
Fairly Good and Good 0.9002 60 88 84 90 
Excellent 3.6998 3 27 12 9 
Total 100 150 120 140 


The averages of these distributions are given by: 


54 F. Delvecchio, G. Delvecchio 


t'1=0.651102337, t=1.194067891, 
t3=1.000104876, 014=0.816531381. 


Since because of (11) it results that Ax=0.0241, the solution of sys- 
tem (7) that minimizes the variance of the partial averages is: 


x= 4.2650, %2=0, +»3=0.4805, x4=0; 


therefore, the quantification that minimizes the variance among these 
averages iS: 


Wm=C, = Wo= W3=C4+4.2650, — w4= Us =C+4.7455. 


So, for example, assuming C=0, the minimizing distributions are de- 
scribed in Table 6. 


Table 6. Minimizing Distributions. 


uaiatone U Frequencies per Faculty 
ni N25 N13; N4j 
Insufficient 0 4 7 6 9 
Mediocre and Fairly Good 4.2650 73 87 78 78 
Good and Excellent 4.7455 23 56 36 53 
Total 100 =150)§6©120-—=— 140 


The averages of these distributions are given by: 
a\=4.2051, th=4.2456, t5=4.1961, t14=4.1729. 


After determining the maximizing and minimizing quantifications of 
the variance of the partial averages, it is necessary to use a hypothetical 
test for the averages equality (obviously, both in the case of the maxim- 
izing and minimizing quantification). 

Since the sample size is always equal to or higher than 100 elements, 
we will use the Welch test (1951) that is suitable in case of big samples: 


nc af }iev 
z a i=l 1 
| 2e- 2) 55 = wi/ > wi)? a 


gee Mere 8i 


3. The Genetic Scaling as a tool for the comparison of averages of data... 55 


considering: 
i=( witi)/D wi » w; =n; /6? . (14) 


The value given by (13) must be compared with the /, 2,.8).0 thresh- 


old value of the ““Snedecor’s F’’, where: 


c ari \2 et 
é,=c-1, " =Tit 3 Se w;i/ dow) . (15) 


In order to calculate test (13), we have quoted in Table 7 the 6? 
values in the case of the two distributions (the maximizing and mini- 
mizing one) considering c=4 as samples. 


Table 7. Variances of the maximizing and minimizing distributions for each 
sample. 


Variances of the maximizing Samples 


and minimizing distributions I II I IV 


(67) 0.477689660 1.523668441 0.943724575 0.739451979 


(67) 0.785023925 0.941057207 0.982284569 1.257362808 


If we choose (v7)' and (v’)" to indicate the values given by (13) for 
the two distributions (the maximizing and minimizing one), since: 


(v?)' = 0.1261< fs: 272; 0.05 = 2.6377 
(v*)" = 7.6211> fs: 277; 0.05 = 2.6371, 


then we decide to refuse the hypothesis of averages equality only in the 
case of the maximizing distribution. 

Now it is necessary to verify which ones (or which one) of those 
averages are not significantly equal to the other ones: it is possible to 
do that by testing the averages differences by means of the multiple 
comparison method. 

The different comparisons are h=6: 


Wi=Hi— by, W2= Hi Hs, W3= HiT HM, 
W4 = H2— Hs, Ws= 27 ba, Wo=H3— ba. 


56 F. Delvecchio, G. Delvecchio 


If a represents the level of significance for each comparison, then we 
need to have y=a/h in order to make the simultaneously-considered confi- 
dence intervals have a confidence level higher than 1—a. Therefore, assum- 
ing 0 = 0.05, we obtain y/2 = a/ (2h) = 0.00416667. 

After calculating the g' degrees of freedom for each r comparison by 
assuming that: 


(s?/ 81 +83/¢2) 


Ajo By Ag 8 
8, / gj) +89/ 85 


' 


g'=Int 


(16) 


since the distribution of the Welch test for two samples is approximated 
to the 7; distribution, it is possible to determine the confidence intervals 
for y;, as it is quoted in Table 8. 

In this Table we chosen to indicate with the symbol (*) all non-sig- 
nificant differences: by the way, it is worth to remind that the non-sig- 
nificant differences are those whose confidence intervals contain 0. 


Table 8. Scheme to computing the confidence intervals for y,. 


Comparisons | g te'y/2 6; /n, +6; /n; | Intervals for y, 


(Economics) vs. 
(Humanities) 


241 | 2.660212887 | 0.122207554 —0.543 + 0.325 


(Economics) vs.) 413 | > 663124486 | 0.112433304 | —0.349 + 0.299 
(Sciences) 
(Economics) vs. 
(Law) 
(Humanities) 
vs. (Sciences) 


234 | 2.660874940 | 0.100293053 —0.165 + 0.267 (*) 


267 | 2.658059888 | 0.134246643 0.194 + 0.357 (*) 


(Humanities) | 566 | 2.658134858 | 0.124256145 0.378 + 0.330 
vs. (Law) 
aay VS: | 239 | 2.660398057 | 0.114656754 | 0.184 + 0.305 (*) 


After reviewing the intervals contained in the last column of Table 8, 
we can conclude that: 
- the graduates coming from the Faculty of Economics express on 
average evaluations (about the effectiveness of the university ed- 
ucation they received in terms of job exploitability) that turn out 


3. The Genetic Scaling as a tool for the comparison of averages of data... 57 


to be significantly less positive than those expressed by the gradu- 
ates coming from the Faculties of Humanities and Mathematic, 
Physical and Natural Sciences; 

- the graduates coming from the Faculty of Humanities express on 
average evaluations that turn out to be more positive than those 
expressed by the graduates coming from the Faculties of Law; 

- all the other differences are not significant at the prefixed level. 


Conclusions 


The problem of the comparison of average evaluations expressed at or- 
dinal level by two or more groups of individuals about a stimulating 
phenomenon is formulated in this paper by means of the objective func- 
tion method, and it is then solved by using the genetic algorithms. 

The method of the objective function is the less subjective method 
among those that are already present in the literature, because even the 
choice of the most appropriate optimizing function is univocally linked 
to the particular kind of problem that must be solved. 

As a matter of fact, we have shown that if uj; is the (unknown) nu- 
merical value that must be assigned to the j attribute, and if 
Xj = Uj+1 — uj= 0, then the objective function that must be used to compare 
the average evaluations expressed by two groups is the linear function 
(that represents the difference of the averages that are calculated by us- 
ing the quantification procedure). If, instead, the average evaluations 
are expressed by more than two groups, then the objective function that 
must be used to compare them is the quadratic function (that represents 
the variance of the averages of the groups). 

However, although the quantifications that can be used are virtually 
innumerable, the problem of the comparison of average evaluations is 
finally referable to the analysis of the following results: 

= if the differences between the F2;— F\, distribution functions of 

two groups (i.e. the a vector components) have the same sign, 
then the evaluations expressed by one group are always on aver- 
age more positive than those expressed by the other group, what- 
ever quantification is chosen; 

= if the a vector components are discordant, then there are quanti- 

fications that make the average of the evaluations of the first 


58 F. Delvecchio, G. Delvecchio 


group higher than the average of the second group, and quantifi- 
cations that make it lower: therefore, the observation of a vector 
is not sufficient to make a decision. When this happens, if we 
consider that, among the innumerable quantifications, there is 
only one quantification that maximizes the objective function 
and, for the same reason, only one quantification that minimizes 
it, we have decided to use their respective two distributions to 
make this decision. So this paper is intended to provide for a cri- 
terion that starts from the comparison of the averages in the case 
of maximizing and minimizing distributions and allows to decide 
whether accepting or not the hypothesis according to which the 
average evaluations are equivalent even when the a vector com- 
ponents have discordant signs. 


Besides, thanks to its objectiveness, the method we propose deter- 
mines autonomously the possibility to aggregate more methods to- 
gether, in order to obtain an optimum quantification. 

The usefulness and the goodness of our method are confirmed by the 
results we obtained in the comparison that we have taken into consid- 
eration for application-oriented purposes as well: in particular, our com- 
parison concerned the evaluations expressed by a sample of the univer- 
sity graduates about the effectiveness of the university education they 
received in terms of job exploitability (we divided the interviewed peo- 
ple according to the different Faculties the graduates attended and to 
their current job condition). 

The quantification that we obtained with the method here proposed 
on those verbal evaluations, indeed, leads to the same conclusions that 
we had drawn by carrying out a one-to-a-hundred scale analysis of the 
votes that the interviewed people assigned to the survey subject itself. 
This could be considered as an indirect validation of the method, able 
to provide findings that are similar to those provides by much more 
informative data, though having more meager statistical distributions 
to study. 


Genetic scaling for ordinal variables, ISBN: 978-2-931089-06-4 59 


4 
Some conclusive observation 


Francesco D. d’Ovidio 


The problem that the quantification methods mentioned in the previ- 
ous pages, like others, do not actually solve (except to a limited extent) 
is well known: an objective measure must not depend on the applica- 
tion context, in the sense that it must be invariant both with respect to 
the sample of respondents, both with respect to the characteristics of 
the items. The length of a table remains the same whoever the meas- 
uring subject is, as long as he uses a correct meter and does not have 
severe optical problems, and wherever it is measured (in a shop show- 
room or at home), and even after some time 

For a correct estimate of the quantities involved, therefore, the re- 
searcher needs to transcend both the particular context in which the 
measurement is taken and the instrument used to obtain it. 

Consequently, for a correct treatment of the answers obtained, a 
technique is needed which, for example, provides a joint estimate of 
the ability of the subjects (in order to address the inhomogeneity be- 
tween the answers), of the reproducibility of the measures (in to man- 
age the changing circumstances) and the difficulty of the items (in 
order to correctly estimate the differences between different items or 
between different levels of the same item). 

In other words, a statistical model is needed that can provide infor- 
mation on: 

- which relationship exists between response frequencies and the 

probability of obtaining a given response; 

- which relationship exists, in terms of probability, between ob- 

served and expected responses; 

- which error is associated with the estimated measures; 

One of the advantages deriving from this approach (and not the 
least interesting) is the passage from models of the perceptive type to 
models less bound to subjectivity (non-perceptivistic). Various mod- 
els have been developed which, overcoming the limitations of item 
analysis, are configured as item response models (Item Response 


60 


F. D. d’Ovidio 


Theory)!. The IRT, of which Rasch Analysis is the spearhead, repre- 
sents an important evolution compared to the classical theory of tests, 
when it is necessary to deal with replies to pre-existing questionnaires. 

However, even within its limits, the method here proposed has var- 
ious advantages: 


1) 


2) 


3) 


4) 


on ordinal scales (for example, judgements or attitudes), the ap- 
plication of the method allows the researchers to refine the 
scales by redefining the categories themselves; 

like other item-response models, it allows the construction of 
scales with valid characteristics of internal coherence and uni- 
dimensionality; 

a scale able to providing measures on an interval scale makes it 
possible to overcome the problems of non-comparability of the 
ordinal and behavioral scales, making the use of these scales 
more acceptable and correct in different areas; 

the values obtained from the Genetic Scaling method, ulti- 
mately, are quite close to measures expressed on an interval 
scale, and this allows the researchers to apply (without preju- 
dice to the other application conditions) useful statistical meth- 
odologies in subsequent analyses, for example the study of co- 
variance (dependence or interdependence with respect to char- 
acteristic variables of the sample, factor analysis, etc.). 


' The previously described techniques belong to the large family of Item Analysis Mod- 
els; the Item Analysis models, although quite simple to construct and explain, present 
some problems: first of all, the latent abilities of the subjects must be known; moreover, 
the characteristics of the items and the abilities of the subjects are analyzed inde- 
pendently, and no relationship is established between them. 


Genetic scaling for ordinal variables, ISBN: 978-2-931089-06-4 61 


Bibliographic references 


Cammarata S. (1994). Sistemi fuzzy: un’applicazione di successo dell’intelli- 
genza artificiale, Etaslibri, Milano. 


Chipperfield A., Fleming P., Fonseca C. (1995). Genetic Algorithm Toolbox, 
Department of Automatic Control and Systems Engineering, University of 
Sheffield: http://www.shef.ac.uk. 


Covitti A., Delvecchio G., Neri F., Sylos Labini M. (2003). A “Quasi-Genetic” 
Algorithm for Searching for the Dangerous Areas Generated by a Ground- 
ing System, Proceedings of the XI International Symposium on Electro- 
magnetic Fields in Electrical Engineering - ISEF 2003, 18-20 September 
2003, Maribor, Slovenia: 713-718. 


Crocetta C., Toma E. (2003). Un cruscotto di indicatori per la valutazione della 
didattica nell’Universita di Foggia, in: L. Fabbris (ed.) LAID-OUT: sco- 
prire irischi con l’analisi di segmentazione, CLEUP, Padova: 159-172. 


Delvecchio F. (1984). Dalla qualita alla quantita, Quaderni della Scuola di 
Statistica dell’ Universita di Bari, 9. 


Delvecchio F. (2010). Statistica per l’analisi di dati multidimensionali, 
CLEUP, Padova. 


Delvecchio F. (2015). Statistica per l'analisi dei fenomeni sociali, CLEUP, Pa- 
dova. 


Delvecchio G., Neri F., Sylos Labini M. (2002). A Genetic Algorithm 
Method for Determining the Maximum Touch Voltage Generated by a 
Grounding System, Proceedings of the VI-th International Workshop on 
Optimisation and Inverse Problems in Electromagnetism — OIPE 2002, 
12-14 settembre 2002, Lodz, Polonia. (Selected e published also in “Op- 
timization and Inverse Problems in Electromagnetism”, Kluwer Aca- 


demic Publisher, August 2003, 38: 85-92). 


Delvecchio F., Delvecchio G. (2004). La quantificazione mediante algoritmi 
genetici come strumento per il confronto di giudizi medi espressi su scala 
ordinale. In: E. Aureli (ed.), Le strategie metodologiche per lo studio della 
transizione Universita-lavoro, Cleup, Padova: 199-220. 


Feltovich N. (2003). Critical Values for the Robust Rank-Order Test, Depart- 
ment of Economics, University of Houston (TX), USA. 


Fligner M. A., Policello G. E. II (1981). Robust Rank Procedures for the Beh- 
rens-Fisher Problem, Journal of the American Statistical Association, 
march 1981, vol. 76, Theory and Methods Section, 373: 162-168. 


62 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


Fligner M. A., Policello G. E. II, Randles R.H., Wolfe D. A. (1980). An As- 
ymptotically Distribution-Free Test for Symmetry Versus Asymmetry, 
Journal of the American Statistical Association, march 1980, vol. 75, The- 
ory and Methods Section, 369: 168-172. 


Ghizzetti A. (1952). Complementi ed esercizi di Analisi Matematica, Vol. I, 
Libreria Eredi V. Veschi. 


Goldberg D.E. (1989). Genetic Algorithms in Search, Optimization, and Ma- 
chine Learning, Addison Wesley Publishing Company, Inc.. 


Griffiths A. J. F., Lewontin R. C., Miller J. H., Suzuki D. T. (1989). An intro- 
duction to genetic analysis, 4th revised edition, W.H.Freeman & Co, Ltd. 
(a division of Macmillan Publishers, Ltd., UK & US) 


Guttman L. A. (1941). The quantification of a class of attributes: A theory and 
method of scale construction. In: P. Horst (ed.), The prediction of personal 
adjustment. Social Science Research Council, New York. 


Herzel A. (1974a). Un criterio di quantificazione - Aspetti statistici, Metron, 
XXXII: 3-54. 


Herzel A. (1974b). Un criterio di quantificazione - Aspetti matematici, Metron, 
XXXIL 55-121. 


Holland J. H. (1992). Adaptation in natural and artificial Systems, The MIT 
Press, Cambridge (MA). 


Kruskal J. B. (1964). Non-metric multidimensional scaling: a numerical 
method. Non-metric hypothesis. Psichometrika, 29: 115-129 


Kruskal J. B. (1965). Analysis of factorial experiments by estimating mono- 
tone transformations of the data. Journal of Royal Statistical Society, Series 
B, 27: 251-263 

Likert R. (1932). A technique for the measurement of attitudes. Archives of 
Psychology, 140: 5-53. 


Marbach G. (1974). Sulla presunta equidistanza degli intervalli nelle scale di 
valutazione. Metron, XXXII, n. 1-4. 


Rasch G. (1960). Probabilistic models for some intelligence and attainment 
tests. Danish Istitute for Educational Research, Copenhagen. (Expanded 
edition, with foreword and afterword by B.D. Wright (1980), MESA Press, 
Chicago, IL). 


Savino M. (1992). Fondamenti di scienza delle misure, NIS, Roma. 


Siegel S., Castellan N. J. Jr (1988). Non parametric Statistics for the Behav- 
ioral Sciences, N. Y., Mc Graw-Hill. 


References 63 


Thurstone L. L. (1925). A method of scaling psycological and educational 
tests. Journal of Educational Psychology, 16(7): 433-451. 


Thurstone L. L. (1929). The Measurement of Psychological Value. In: T.V. 
Smith and W.K. Wright (eds.), Essays in Philosophy by Seventeen Doctors 
of Philosophy of the University of Chicago, Open Court, Chicago (IL). 

Torgenson W. S. (1952). Multidimensional scaling - Theory and method. Psy- 
chometrika, 17: 401-419. 


Torgenson W. S. (1967). Theory and Methods of Scaling, Wiley and Sons, 
N.Y. 


Welch B. L. (1951). On the comparison of several mean values: an alternative 
approach, Biometrika, 38: 330-336. 


Genetic scaling for ordinal variables, ISBN: 978-2-931089-06-4 65 


Appendix: 
Matlab Code 


(credits: G. Delvecchio, University of Bari, Italy) 


AHHHHHHNHMHNHAHHAHHHNWHNHAHHAHHAHMHMHNMHNHAHHHNWHNHNWNHHWVHNHN26% 


% herzel.m Indeterminate quantification. 

% The first method maximizes (or minimizes) all the 

% differences of the partial averages, with the constraint 

% that the total variance is unitary. 
HYKYLKLKUUMUWUHHHHYNYHKHYYNLYYOUHWUHHHHNYHYYHYVH%H%H%H% 6% 
% The purpose of the program is to determine the values of all 
% distances X of the U modalities (on an oriented straight line) 
% of the Herzel "Indeterminate Quantification" method, using 
% genetic algorithms. 

% k = distances X of the modes between them 

% k + 1 = number of modalities (U) to be positioned on the 

% oriented line% 


AHHHHHHNHNHHHHHHAHHHNWHNHAHHAHHWAHMHAHHNHHNMHAHHHMHNHNHNHHWVHMHMH% 
% 


clear all 

%clc 

close all 

disp(date) 

% 

% INPUT DATA (example) 

% 

nl=[6 30 45 24 5]' % absolute frequencies of 1" partial distribution 
n2=[4 10 45 20 11]' % absolute frequencies of 2™ partial distribution 

% 

cost=0; % arbitrary constant used to derive U 

Gre * KREKKEEKREKKREREKEEREEKREREEEKREEEREEEREREREEEREEREEREEEREEEREEEREEREEEREREEREREERE 
% SETTING: 

% 

options=foptions([{1 le-3]); % old Matlab statement, for back compatib. 
% options(1)=0; % PRINTING (Default PRINTING=1) 

% print the results table 
options(2)=0.0001; % terminate (Default terminate=0.001) 

% options(10): index stopping generation in the algorithm 
% (output value) 


options(11)=100*(length(n1)-1); % size_of population (Default size_pop=30) 
% options(12)=1; % Pc: probabilita di crossover (Default Pc=1) 


66 


F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


options(13)=0.04; 
options(14)=300; 
% 

bits=30; 


% delta=0.000000001 


vlb=0; 
vub=pi/2; 


k1=100; 
% 
k2=100; 


phi_max=pi/2; 
th_max=pi/2; 


% Pm: probability of mutation 

% max_gen: maximum no. of generations 
(Default max_gen=100) 

% number of bits: initialization 


% angles’ lower limit (scalar, initialization) 
% angles’ upper limit (scalar, initialization) 


k1 k2: parameters of the fitness function: 
Fit=k1*M1M2+k2 

% k1>0 compute maximizing distribution; 
k1<0 compute minimizing distribution 


% angles used to 3-D plot in case k=3 


AHHHHWHNHHNMHNHHHHHHHNHHNHNHAHHAHMHNHNHNHAHHAHNWHMHNHNHHAH NHN 


% INITIAL CALCULATIONS 


n=n1+n2; 
k=length(n)-1 


Nl=sum(n1); 
N2=sum(n2); 
N=N1+4+N2; 


fl=n1/N1; 
f2=n2/N2; 
f=n/N; 


Fl=cumsum(f1) 
F2=cumsum(f2) 
F=cumsum(f) 


a=F2-F1 


% absolute frequencies of the overall distribution 
% number of modalities - 1 


% size of the 1“ distribution 
% size of the 2™ distribution 
% total size 


% relative frequencies of the 1" partial distribution 
% relative frequencies of the 2™ partial distribution 
% relative frequencies of the overall distribution 


% relative cumulative freq. of the 1* partial distrib. 
% relative cumulative freq. of the 2" partial distrib. 


% relative cumulative freq. of the overall distrib. 


% auxiliary variabile 


AHHHHHHNHHNMHNHHHHHWHNHNHAHHAHHAHMHMHNHMH HHH HHNWHNHNMWAHNWVNH2026% 


if all(a(1:k)<=0) 
disp(’ 
disp(‘caso a<=0') 


ratio=a./sqrt(F.*(1-F)); 


ratio=ratio(1:k); 
disp(blanks(2)') 


disp(‘maximum of the differences of the partial means M1-M2 with 


the Herzel method’) 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 67 


MaxM1M2=max(ratio) % maximum of the differences of 
% the partial means M1-M2 

t=find(ratio==MaxM1M2); 

Ft=F(t); 


disp(' Solution by Herzel's method’) 
X=[zeros(t-1,1); 1/sqrt(Ft*(1-Ft)); zeros(k-t,1)] % solution vector 


% (column vector) 

disp(blanks(2)’ ) 

disp("** feeder ie!) 
end 
HHNHWWNNHHNHHHNHHWNHHNNNHHWMNHHWWNHWMNHHNWNHMWNWNWWHNHWWNM% 
% Building the B matrix of the quadratic form: X'*B*X=1 
for i=1:k 

for j=1:k 

BG,j)=F@)*(1-F(j)); 

end 

end 


B=triu(B)+triu(B, 1)’; 
POLO NL OL OA SASL. LSA SK SASL SASK OKOL SALSA SASL SL SLD OK SLL SL SAAS SK SK SASL SL SLL L LS SESKSLSLSL SKIL 
% GENETIC ALGORYTHM 


bits=bits*ones(1,k-1); % no. of bits: BITS is a row vector with one 
% row and (length(VLB)) columns 
vib=vlb*ones(1,k-1); % angles’ lower limit (row vector) 
vub=vub*ones(1,k-1); % angles’ upper limit (row vector) 


delta=(vub-vlb)./((2.Abits)-1) 
%bits=ceil(log2((vub-vlb)/delta +1)) 


[Angles,stats,options,bf,fgen,lgen] = 
genetic(‘fitnessM1M2',[],options,vlb,vub,bits,k,F,a,B,k1,k2); 


% stats: [max min mean std] for each generation 
% options: options used 

% bf: fitness of individual X (i.e.: best fitness) 
% fgen: first generation population 

% gen: last generation population 

% 

Angles 


disp(‘bf: fitness of individual X (i.e.: best fitness)') 
bf=(bf-k2)/k1 % M1-M2 
[X,M1M2]=CalcM1M2(Angles',k,F,a,B) 


figure 
%plot(stats(:,[1 3])) 


68 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


abscissa=0:size(stats, 1)-1; 
plot(abscissa,stats(:,1), ':'", abscissa,stats(:,3), '-') 
xlabel('Generations’) 


ylabel('Fitness') 
legend('‘max’, 'mean’') 
grid 


% End of the GENETIC ALGORYTHM 
HHLHWHNHYHYVHHMWOHHHLHHWNHHWLHHVKHHUHHHWNLHHHKHYHHWH%M% 
disp(‘quantifying vector’) 


U=[cost; cost+cumsum(X)] % quantifying vector 
disp(‘average of the first distribution’) 

M1=sum(f1.*U) % average of the 1* distribution 
disp(‘average of the second distribution’) 

M2=sum(f2.*U) % average of the 2™ distribution 


disp(‘difference of the averages of the two distributions: M1-M2') 
M1M2=M1-M2 


%disp(‘anova’) 
Y%anoval({nl n2]) 


disp('values for truncation of X’') 

dX=troncquant1(a,0.0001) 

% return % Routine’s STOP (optional) 
TOOL OL OL SA SASAKI SL SLL SASL OL SL SA SASL SL SLL SASK OL SASL SASL SASL SASL SK SKL SLL LLL SSSSLSLS 
% PLOT (case k=3) 


% Note: in the case k = 3, only 2 independent variables exist (X1 and X2), 

% since the third (Xk = X3) can be obtained from the constraint that the 

% total variance is unitary. Therefore the plotting is possible. 

if k==3 
Npt=50; % number of points for plotting process 
[phi,th]=meshgrid(linspace(0,phi_max, Npt), linspace(0,th_max, Npt)); 

phi=phi(:)'; % row vector 

th=th(:)'; % row vector 


Angles=[phi;th]; 
[X,M1M2]=CalcM1M2(Angles,k,F,a,B); 


X=X(1,:); 
y=X(2,:); 
Z=X(3,'); 
x=reshape(x,Npt,Npt); 
y=reshape(y,Npt,Npt); 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 69 


z=reshape(z,Npt,Npt); 
M1M2=reshape(M1M2,Npt,Npt); 


figure 

%surf(x,y,Z) 

mesh(x,y,z), colormap([0 0 0]) 
xlabel(‘{\it x}_{1}) 
ylabel(‘{\it x}_{2}') 
Zlabel(‘{\it x}_{3}') 

title("Ellipsoid’) 


figure 

%surf(x,y,M1M2) 

mesh(x,y,M1M2), colormap([0 0 0]) 
xlabel(‘{\it x}_{1}') 
ylabel(‘{\it x}_{2}') 
zlabel(‘{\it x}_{3}') 
title(Difference of averages’) 


figure 
contour(x,y,M1M2,20) 
xlabel(‘{\it x}_{1}') 
ylabel(‘{\it x}_{2}') 
title(Difference of averages’) 
grid 


end 


% for colour plot 
% black & white plot 


% for colour plot 
% black & white plot 


% end of the plot procedure (k=3) 


AHHHHHHNHHNHNHHHHAHHHNWHNHNHAHHAHNWHNWHNHNAHAHHAHHWHNMHNHNWNH NWI 


70 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


KV OUHOUHOUWO UUW GUO OUHOUHOHHOHHOHW NGG % 
% herzel2.m Indeterminate quantification of squares 


% The first method maximizes (or minimizes) the weighted 

% square of differences of the partial averages with the total 

% average, with the constraint that the total variance is unitary 
HHNHMWHNNHMWNHHWNHHMNHHHNVWWMNHWNNHWMNHHWNHWNWAHHWNWNYWNM% 
% The purpose of the program is to determine the values of all 
% distances X of the U modalities (on an oriented straight line) 
% of the Herzel "Indeterminate Quantification" method, using 
% genetic algorithms. 

% k = distances X of the modes between them 

% k + 1 = number of modalities (U) to be positioned on the 

% oriented line% 


AHHHHHHNHMHNHHNHHNWHNHHHNHAHHAHMWHNMHNMHNHAHHAHMWHNMHNWNHHWHHH2V0% 
% 


clear all 
%clc 
close all 
disp(date) 
% 
% INPUT DATA (example) 
% 
% Absolute frequencies of the distribution 
% Each column represents a distribution 
% Each line is a modality 
% nl n2 n3 
nr=[ 6 8 Qyave % modality 1 
3 12 63... % modality 2 
1 15 F225 % modality 3 
0 5 30] % modality 4 
cost=0; % % arbitrary constant used to derive U 
alpha = 0.05 % significance level of statistical tests 
stamp = 0 % prints the partial results of the statistical tests 
%* KKK KEEKEKEREREREEREREREREREEEREREREREREREERERERERERER KEEKKKEEREK 
% SETTINGS 
options=foptions([1 le-3]); % old Matlab statement, for back compatib. 
% options(1)=0; % PRINTING (Default PRINTING=1) 
% print the results table 
options(2)=0.0001; % terminate (Default terminate=0.001) 
% options(10): index stopping generation in the algorithm 
% (output value) 
options(11)=300; % size_of population (Default size_pop=30) 
% options(12)=1; % Pc: probabilita di crossover (Default Pc=1) 
options(13)=0.04; % Pm: probability of mutation 
options(14)=300; % max_gen: maximum no. of generations 
% (Default max_gen=100) 
bits=30; % no. of bits: initialization 


% delta=0.000000001 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 71 


vlb=0; % angles’ lower limit (scalar, initialization) 
vub=pi/2; % angles’ upper limit (scalar, initialization) 
% k1 k2: parameters of the fitness function: 
% Fit=k1*XAX+k2 

k1=100; % k1>0 compute maximizing distribution; 
% k1<0 compute minimizing distribution 
k2=100; 

phi_max=pi/2; % angles used to 3-D plot in case k=3 
th_max=pi/2; 


AHHHHHHHHNHNHHHHAHMHNWHNHHHAHHAHMHNMHNMHNMWAHHHNWHNHNWAHHWVHNHM26% 
% INITIAL CALCULATIONS 


q=size(nr, 2); % number of distributions 
n=sum(nr,2) % absolute frequencies of the overall distribution 
k=length(n)-1 % number of modalities - 1 
Nr=sum(nr) % size of each distribution 
N=sum(Nr) % total size 
pr=Nr/N % 
fr=nr./(ones(k+1,1)*Nr) % relative frequencies of each partial distribution 
f=n/N % relative frequencies of the overall distribution 
Fr=cumsum(fr) % relative cumulative freq. of each partial distrib. 
F=cumsum(f) % relative cumulative freq. of the overall distrib. 
HYVKKKLKUUWWHHHHHHYNYHYHYYHLHYOUHHHHHHHNYHYYKYYH%H%H%%% 
% Building the A matrix of the quadratic form: X'*A*X 
for i=1:k 

for j=1:k 

A(i,j)=sum(pr.*Fr(i,:).*Fr(j,:))-FG)*FG); 

end 
end 
HYYYVKKKUUMWMOHHHHNYNYHYHYYNLYYOUHHUHHHHNYHYYKYYH%%H%%%% 
% Building the B matrix of the quadratic form: X'*B*X=1 
for i=1:k 

for j=1:k 

BG,j)=F@)*(1-F(j)); 

end 

end 


B=triu(B)+triu(B, 1)’; 
POLO NL SY SA SASL SL SLL SAILS SL SL SK SLO SL SALSA SASK SK SKK SASL SL SLL SLSL SKK L SASL L LSU SLSKSGSLSLSGS 
% GENETIC ALGORYTHM 


bits=bits*ones(1,k-1); % no. of bits: BITS is a row vector with one 
% row and (length(VLB)) columns 
vib=vlb*ones(1,k-1); % angles’ lower limit (row vector) 
vub=vub*ones(1,k-1); % angles’ upper limit (row vector) 


delta=(vub-vlb)./((2.Abits)-1) 
%bits=ceil(log2((vub-vlb)/delta +1)) 


72 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


[Angles,stats,options,bf,fgen,lgen] = 
genetic(‘fitnessXAX',[],options,vlb,vub,bits,k,F,A,B,k1,k2); 


% stats: [max min mean std] for each generation 
% options: options used 

% bf: fitness of individual X (i.e.: best fitness) 
% fgen: first generation population 

% gen: last generation population 

Angles 


bf=(bf-k2)/k1 % X"*A*X 
[X,XAX]=CalcXAX(Angles',k,F,A,B) 


figure 

%plot(stats(:,[1 3])) 

abscissa=0:size(stats, 1)-1; 
plot(abscissa,stats(:,1), ':'", abscissa,stats(:,3), '-') 
xlabel('Generations’) 


ylabel('Fitness') 
legend('‘max’, 'mean') 
grid 


% End of the GENETIC ALGORYTHM 
AHHHHMHNHHMHNHHHHNWHNHHNHNHHAHHAHMHMHNHNWAHHHNWHNHNHAHNWHNHW2% 


U=[cost; cost+cumsum(X)] % quantifying vector 
Distr=fr.*repmat(U,1,q) % distributions 
M=sum(fr.*repmat(U,1,q)) % distributions’ means (row vector) 
HYYYNKLKUMUWUHHHHHNYHYHYYHLYYOUMUMUHHHHHYHYHKYYYY%%%H%% 
% building vector Mi-Mj 
diff=combntns(1:q,2); 

M=M'; 

diffmean=[diff, M(diff(:,1))-M(diff(:,2))]; 

disp(’ i j. Mi-Mj’) 


%disp(sprintf('%5.0f %5.0f %12.4f 
‘ diffmean(:,1),diffmean(:,2),diffmean(:,3))) 


disp(diffmean) 
HHW%%%YNYNYHHWUHWYHNYNYNHYHWHW%HHNYYHHHMHHHNHYNHHHW%YHYNNYNVY% 
% building matrix Fi-Fj 

aus2="; % size inizialization 


for i=1:size(diff,1) 

aus 1 (:,))=Fr(:,diff(, 1))-Fr(:, diff, 2)); 

aus2=strcat(aus2, int2str(diff(i,1)), '_', int2str(diff(i,2)), ' ‘); 
end 


disp(‘Fi-Fj') 
disp(aus2) 
disp(aus1) 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 73 


HKYYKKKLKUMHHUOHHHHHYKNYHYYYLOYOUHHHHHHHYHYYYKYYH%%%H%%%M% 
% PLOT (case k=3) 


% Note: in the case k = 3, only 2 independent variables exist (x1 and X2), 

% since the third (Xk = X3) can be obtained from the constraint that the 

% total variance is unitary. Therefore the plotting is possible. 

if k==3 
Npt=50; % number of points for plotting process 
[phi,th]=meshgrid(linspace(0,phi_max, Npt), linspace(0,th_max, Npt)); 

phi=phi(:)'; % row vector 

th=th(:)'; % row vector 


Angles=[phi;th]; 
[X,XAX]=CalcXAX(Angles,k,F,A,B); 


x=X(1,°); 
y=X(2,:); 
Z=X(3,'); 
x=reshape(x,Npt,Npt); 
y=reshape(y,Npt,Npt); 
z=reshape(z,Npt,Npt); 
XAX=reshape(XAX,Npt,Npt); 


figure 

surf(x,y,z) % colour plot 
xlabel(‘{\it x}_{1}') 

ylabel(‘{\it x}_{2}') 

Zlabel(‘{\it x}_{3}') 


title(Ellipsoid') 

figure 

%surf(x,y,XAX) % for colour plot 
mesh(x,y,XAX), colormap([0 0 0]) % black & white plot 


xlabel(‘{\it x}_{1}') 
ylabel(‘{\it x}_{2}') 
Zlabel(‘{\it x}_{3}') 
title(Quadratic objective function’) 


figure 
contour(x,y,XAX,20) 
xlabel(‘{\it x}_{1}') 
ylabel(‘{\it x}_{2}') 
title("XAX’) 
grid 
end 


% end of the plot procedure (k=3) 
HHNHWWNNHWNHHMWNHHWNHHWNNHWMNHHHWWNHWMNHHNWNHWNNYWWNWHYWNM% 


74 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


YUU OHLUWN HHH UHVHNWYWHUHOHVUKHNUHUH“HNUNHHHN%%% 
% 
% troncquant1.m 


% 
HHNHMWWNHHMNHWMHHHNWNHWHNYWNWHWWN NWN HVWMNHMWLHNWWNNYWVM% 
% The purpose of the program is to determine the values dx 

% for which the values of X can be truncated in such a way as 

% to have a variation dy of the objective function (y = a'* X) 

% such that dy <eps (where eps is a value as small as one wants) 

% k = distances X of the modes between them 

% k + 1 = number of modalities (U) to be positioned on the 

% oriented line% 


YEWUGUOHUGU~SHUKUHHUHUOHUHUKHWUHUHHUHUKHUHUHUOHHN 6% 
function dX=troncquant1(a,eps) 


HHHHMHHNHHNMHNHHHHHHHNWHNHNHAHHAHMAHHHMHNHHHAHMWHAMHNHNWNHWH2026% 


% INPUT 

% a: vector "a" of the linear form: a'*X 

% eps: — threshold value (scalar) 

% 

% OUTPUT 

% dx: values for which it is possible to truncate the values of X in such 


a way as to have dy <eps (where y = a'* X) 


AHHHHHHNWHNHHNHHHAHHHNWHNHNHAH HAH MHNWHNMH HHH HHH NHNWVHWN20167026%6 


k=length(a)-1; % number of modalities - 1 
a=a(1:k); 


dx=eps/sum(abs(a)); 
YUKU UOKUUSKUUNHUULKUUNHUUKHUUNHUUNUU%G%M% 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 75 


AHHHHHHAHHNLHNHHHHAHHHNWHNHAHAHHAHMHNMHNMHNHHAHHHNWHNHNWVHNWVHNHMH6% 
% 


% CalcM1M2.m 
% This function calculates the difference of the partial means: M1-M2 
% 


HHKKHWNKYHYHHHHKHHHKHHWKHHYLHHWYHHWYHHWNKHHHKLHHWNYW%M% 
function [X,M1M2]=CalcM1M2(Angles,k,F,a,B) 
HYHKKHWNKHHVHHHKHHHKHHWKHHWLHHW~HHHWOHHWNLHHHKLHWWKHW%% 


% INPUT 
% Angles: angles of polar coordinates (in radians): [phi; th] 


% In the case of a single point: 

% phi (scalar) 

% th (column vector of length n = k-2) 

% X (column vector of length k) 

% In the case of N points: 

% phi (line vector of length N) 

% th (matrix with n = k-2 rows and N columns) 
% X (matrix with k rows and N columns) 


% k: number of modalities - 1 

% F: cumulative relative frequencies of the overall distribution 
% a: auxiliary variable 

% B: central matrix of the quadratic form: X' * B * X=1 

% 

% OUTPUT 

% M1M2 difference of partial averages: M1-M2 

% In the case of a single point: 


% X (column vector of length k) 
% In the case of N points: 
% X (matrix with k rows and N columns) 


HHKKHHNHYHYVHHHKHHHLHHWLHHYLHHWHHHWHHHWNLHHHKLHHWNHH%% 
phi=Angles(1,:); 
th=Angles(2:k-1,:); 


*ophi=phi(:)’; 
%th=th(:)'; 


X=sph2cartN(ones(size(phi)),phi,th); 
aus=X'*B*X; 

aus=diag(aus); 

r=sqrt(1 ./aus); 

X=X"*diag(r); 


M1M2=a(1:k)' *X; 
POLO OA SL SA SASL SL SA SASL SLL SK SK LSA SA SAS. SU SKS SKSKOL SL SASL SASL SK SASL SASL SL LLL SUSKSLSL SKIL 


76 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


AHHHHHHNHHNMHNHHHHAHNWHNMHNHAHHAHHAHMHNMHNH NWA HH NWHNHNWAHNWHNHV2% 
% 


% CalcXAX.m 
% This function calculates the quadratic form X*A*X 
% 


HHLHHHNKHYVHHHOHHHNHHWKHHW~LHHONKHHMYHHWKHHHHHHHHHW%M% 
function [X,XAX]=CalcXAX(Angles,k,F,A,B) 


HHNHMWWNNWMNHHMWNHHWNHHNNHWMNHWNNYWWMNHHNWNWHWNNYWNNNHNO% 
% INPUT 
% Angles: angles of polar coordinates (in radians): [phi; th] 


% In the case of a single point: 

% phi (scalar) 

% th (column vector of length n = k-2) 

% X (column vector of length k) 

% In the case of N points: 

% phi (line vector of length N) 

% th (matrix with n = k-2 rows and N columns) 
% X (matrix with k rows and N columns) 


% k: number of modalities - 1 

% F: cumulative relative frequencies of the overall distribution 
% A: central matrix of the quadratic form: X'*A*X 

% B: central matrix of the quadratic form: X' * B * X= 1 

% 

% OUTPUT 

% XAX: quadratic form X*A*X 

% In the case of a single point: 


% X (column vector of length k) 
% In the case of N points: 
% X (matrix with k rows and N columns) 


HHLHYHNHYHYVHHHOHHHNHHWLHHWLHHWKHHWOHHWLHHNKYH%HHHWH%M% 
phi=Angles(1,:); 
th=Angles(2:k-1,:); 


X=sph2cartN(ones(size(phi)),phi,th); 
aus=X'*B*X; 

aus=diag(aus); 

r=sqrt(1 ./aus); 

X=X*diag(r); 

XAX=X' *A *X; 


XAX=diag(XAX); 
POLO OL OL SA SASK OA SLA SASL SL SKILLZ SASK SA SA-SU SL SL OA SLL SL SAAS SASK SASL SLL SL SLUGS SLSL 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 77 


HHNHWHNNHHNHHHNAHHWNHHWNWHHWMHHHWMNHWWHHHNWNHHNWNYWNHHWWNM% 

% 

% fitnessM1M2.m 

% This function calculates the fitness of the difference of the partial 
averages: M1-M2 

% 

HHNHWHNHNHHNHHHWNNHHWNHHNNVHHNHWVNHWMHHHNWNHWANWWWHYWNM% 

function Fit=fitnessM1M2(Angles,k,F,a,B,k1,k2) 


HHNNWHNNHHNHYHHWNNHWNHHNNVHWMNHWWNHWMHHHNWNHWANYWHNHWWNM% 
% INPUT 

% Angles: row vector of length k-1 (in radians) 

% k: number of modalities - 1 

% F: cumulative relative frequencies of the overall distribution 

% a: auxiliary variable 

% B: central matrix of the quadratic form: X' * B * X= 1 

% k1 k2: parameters of the function: Fit=k1*M1M2+k2 

% 

% OUTPUT 

% M1M2: difference of partial averages (scalar): M1-M2 

% Fit: fitness value (scalar): Fit=k]*M1M2+k2 
HHNNWHNNHHNHHHNNHWHHHNHNHHWHHHHWWNHWMNHHHNWNHWANWWNWNHYWNM% 


% calculation of the difference of partial averages: M1-M2 


phi=Angles(1); % scalar 

th=Angles(2:k-1)'; % column vector of length n = k-2 

X=sph2cartN(1,phi,th); % column vector of length n + 2=k 

aus=X'*B*X; % auxiliary variable (scalar) 

r=sqrt(1 ./aus); % radius (scalar) 

X=X*7; % cartesian coordinates of point P on the 
hyperellipsoid 

M1M2=a(1:k)' *X; % difference between the partial means (scalar) 


AHNHHHHHNHHNMHNHHHHAHHHNHNHAHAHHAHMHMHNMH HHH HHNWHNHNWNHMWNHWHNH6% 


% calculation of Fitness of the partial averages’ difference: M1-M2 


Fit=k1*M1M2+k2; 
AHHHHHHNHHNMHNHHHHAHHHNWHNHHHAHHAHMHNWHNMHNHAHHAHNWHNMHNHNHNH NWN 


78 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


AHHHHHHNHHNHNHHHHMHNHNHNHAHHAHMHNMHNHNHAHHHNWHNHNWAHNWHMHV26% 
% 


% fitnessXAX.m 
% This function calculates the fitness of the quadratic form: X'*A*X 
% 


HLH NKYHYMHHHOHHHKHHWLHHYLHHMKHHUOHHWLHHKKYHYHHH%% 
function Fit=fitnessxAX(Angles,k,F,A,B,k1,k2) 


HHNHMWWNNHHNHNYHWNNHHWNHWMNVHHWNNHMNHWMNHVWMNHHNWNHWHNY%% 
% INPUT 

% Angles: row vector of length k-1 (in radians) 

% k: number of modalities - 1 

% F: cumulative relative frequencies of the overall distribution 

% A: central matrix della forma quadratica: X'*A*X 

% B: central matrix of the quadratic form: X' * B * X= 1 

% k1 k2: parameters of the function: Fit=k1*XAX+k2 

% 

% OUTPUT 

% XAX: quadratic form X*A*X 

% Fit: fitness value (scalar): Fit=k1*XAX+k2 
HHNHMWWNNHMNHHHMWNHHMNHHWNHWHNHWNNWWWNHHNWNHHHAHWWNNNWNM% 


% calculation of the difference of partial averages as quadratic form: X'*A*X 


phi=Angles(1); % scalar 
th=Angles(2:k-1)'; % column vector of length n = k-2 


X=sph2cartN(1,phi,th); % column vector of length n + 2=k 


aus=X'*B*X; % auxiliary variable (scalar) 

r=sqrt(1 ./aus); % radius (scalar) 

X=X*7; % cartesian coordinates of point P on the 
hyperellipsoid 

XAX=X' *A *X; % difference between the partial means (scalar) 


AHHHHHHNHHNHHNHHHHMHNWHNHHHAHHAHAWHMHHHNMWAHMHNWHNHNWAHHWHMH226% 


% calculation of Fitness of the partial averages’ difference: X'*A*X 


Fit=k1*XAX+k2; 
AHHHHHHNHHNMHNHHHHNWHNHNHAHHAHHAHMHNMHNHHNHAHHHNWHNHNWAHNWHNHM26% 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 79 


HHNNWHNNHHNHHWHNHHWNHHNHNHWHNHHHWWNHWWHHHNWNHWNNWWHNHYWVM% 
% Matlab modules (minimally edited by the Authors) 
HHNNYWHNNHHNHHMWNHHWNHHNWNVHHHNHHWVNHWMNHHWNWNHWAWNYWWHNHYWNM% 
% 

% genetic.m 

% The algorithm implemented here is taken from Genetic 

% Algorithms in Search, Optimization, and Machine Learning, 

% David E. Goldberg, Addison-Wiley Publishing Company, Inc., 

% 1989. 


% Copyright (c) 1993 by the MathWorks, Inc. 

% Andrew Potvin 1-10-93. 

% 
HHNHWHNNHHNHHWNAHHWNHHNNNHHHHHWWNHWMNHHHWNWVNHWANWWNWNHWWNM% 


function [xopt,stats,options,bestf,fgen,lgen] = genetic(fun, ... 
x0,options,vlb,vub,bits,P1,P2,P3,P4,P5,P6,P7P,P8,P9,P10) 


HHNHMWHNNHWMNMHHWNHHMNHWNHNHWWNWHWHNANYWWNHHMNHMWNVNWWNNYWNM% 
% GENETIC tries to maximize a function using a simple genetic algorithm. 

% 

% X=GENETIC(‘FUN',X0,OPTIONS, VLB,VUB) uses a simple (haploid) genetic 


% algorithm to find a maximum of the fitness function FUN 

% (usually an M-file: FUN.M). 

% The user may define all or part of an initial population XO (or supply an 
% empty argument in which case an initial population will be chosen 

% randomly between the lower and upper bounds VLB and VUB). 

% Use OPTIONS to specify optional parameters such as population size 


and maximum number of generations produced. 
% Type HELP GOPTIONS for more information. 


% The default algorithm uses a fixed population size, OPTIONS(11), and 
no generational overlap. 


% Three genetic operations: reproduction, crossover, and mutation are 
performed during procreation. 


% The probability that an individual of the population will reproduce is 
proportional to its fitness. 

% 

% Individuals chosen for reproduction are mated at random. 

% Mating produces two offspring (re: constant population size.) 

% 

% Crossover in mating occurs with probability Pe=OPTIONS(12) and 
the crossover index is randomly selected. 

% Each feature of the offspring can mutate independently with probability 


Pm=OPTIONS(13). Default options are OPTIONS(11:13)=[30 1 O]. 
% 
% The default maximum generations OPTIONS(14) is 100. 
AHAHNNHWNNYHAWNNHHAHNNHHNHNHAHNNWNWNNHAWNNHHAHNNHUWAHNHAHNAWHWNNHANNHAHNHAWNNHN% 


80 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


% INPUT 

% X=GENETIC('FUN',X0,OPTIONS,VLB,VUB,BITS) allows the user to 
% define the number of BITS used to code non-binary parameters 
% as binary strings. Note: length(BITS) must equal length(VLB). 


% X=GENETIC(‘FUN',X0,OPTIONS, VLB, VUB,BITS,P1,P2,...) allows up 
% to ten arguments, P1, P2, ... to be passed directly to FUN. 
% F=FUN(X,P1,P2....). 


% FUN fitness function 
% XO initial population (vettore riga) 
% (or supply an empty argument in which case an initial 


population will be chosen randomly between the lower 
and upper bounds VLB and VUB) 


% OPTIONS 

% VLB lower bounds (vettore riga) 

% VUB upper bounds (vettore riga) 

% BITS number of BITS used to code non-binary parameters as binary 

% strings (vettore riga). 

% Note: length(BITS) must equal length(VLB) 

% P1, ..., P10 
HNWNNHNHNHNHNHNHNNWNHNWHNHNHNHNWHNHNHNHNWHNHNHNHNHHNHNHNNHNHNNWNWNV0% 
% OUTPUT 


% [X,STATS,OPTIONS,BESTF,FGEN,LGEN]=GENETIC(<ARGS>) 
% 
% STATS - [max min mean std] for each generation 
% OPTIONS - options used 
% BESTF - Fitness of indivadual X (i.e.: best fitness) 
%FGEN  - first generation population 
%LGEN  - last generation population 
HNHWNWNHWNYUSHUHHLHNHUNYWOHUHHKHHHHYUMHUYUNHUNYUOHUHHOU%H%%% 
% Note: 
% OPT_STOP used by user to halt optimization prematurely 
% OPT_STEP will be true during the evaluation of the last individual's 
% cost function: Often used to determine when to update graphics. 
HYYYVKKYUUMWWHHHHHHYYNYYYYLYYOUHWUHHHHYHYYNNYVHKH%H%%%%% 
global OPT_STOP OPT_STEP % ??? 
OPT_STOP = 0; 
HYNYYNKLKUUHWWWHHHHHYKHYYYLKOOUMUUHUHHHNYHYHKYYHH%%%H%% 
% Argument and error checking 
if nargin<4, 
error('No population bounds given.") 
elseif (size(vlb,1)~=1) | (size(vub,1)~=1), 
% Remark: this will change if algorithm accomodates matrix variables 
error('VLB and VUB must be row vectors’) 
elseif (size(vlb,2)~=size(vub,2)), 
error('VLB and VUB must have the same number of columns.') 
elseif (size(vub,2)~=size(x0,2)) & (size(x0,1)>0), 
error('XO must all have the same number of columns as VLB and VUB.") 
elseif any(vlb>vub), 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 81 


error(‘Some lower bounds greater than upper bounds’) 
else 
xO_row = size(xO,1); 
for i=1:x0_row, 
if any(x0(xO_row,:)<vlb) | any(x0(xO_row,:)>vub), 
error(‘Some initial population not within bounds.') 
end % if initial pop not within bounds 
end % for initial pop 
end % if nargin<4 


OG RRR ARR IKARIA AIR RRR EIA III ERI IAI IIE IIIA II III HEIDI IIIA AEA 


if nargin<6, 
bits = []; 
elseif (size(bits, 1)~=1) | (size(bits,2)~=size(vlb,2)), 
% Remark: this will change if algorithm accomodates matrix variables 
error(‘BITS must have one row and length(VLB) columns’) 
elseif any(bits~=round(bits)) | any(bits<1), 
error(‘BITS must be a vector of integers >0') 
end % if nargin<6 


HHNNWWNNHHNHHHNAHHWNHHNNNWHHHHWWNHWMH HH NWVNHMWNNYWWHYWNM% 
% Form string to call for function evaluation 
if ~( any(fun<48) | any(fun>122) | any((fun>90) & (fun<97)) | ... 

any((fun>57) & (fun<65)) ), 

% Only alphanumeric implies must be a function 

evalstr = [fun '(x']; 

for i=l:mnargin-6, 

evalstr = [evalstr,',P’,int2str(i)]; 

end % fine for 

evalstr = [evalstr, ')']; 
else 

evalstr = fun; 
end 
HHNNWHNHWMNHHWNHHMNHWNHHYWNWHHHNNWWNHMWWMNHMWNNWWNHVWV% 
% Determine all options 
% Remark: add another options index for type of termination criterion 


if size(options,1)>1, 
error('(OPTIONS must be a row vector’) 
else 
options = foptions(options); 
if options(11)==0, 
% Default size_pop 
options(11) = 30; 
end 
if options(12)==0, 


82 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


% Default Pc 
options(12) = 1; 
end 
if options(14)==0, 
% Default max_gen 
options(14) = 100; 
end 
end 


PRINTING = options(1); 
terminate = options(2); 
size_pop = options(11); 
Pc = options(1 2); 

Pm = options(13); 
max_gen = options(14); 


% Ensure valid options: e.q. Pc,Pm,size_pop,max_gen>0, Pc,Pm<1 
if any([Pc Pm size_pop max_gen]<0) | any([Pc Pm]>1), 
error(‘Some Pc,Pm,size_pop,max_gen<0 or Pc,Pm>1') 
end 
LOLOL OL OL OL OL OAL SLA A SL SK OLOZOZOL IA SAA ANA SASL DA OKOZIA SATA SA NASA OA DL OKLA SAAS SLSLOZIZIZILIAS 
ENCODED = any(any(([vlb; vub; x0]~=0) & ([vlb; vub; xO]~=1))) | ... 


~isempty(bits); 
if ENCODED, 
[fgen,lchrom] = encode(x0,vlb,vub,bits); 
else 
fgen = x0; 
Ichrom = size(vlb,2); 
end 


HHNHMWWNHHWMNHHHWMNHHNWNHWNHNMWWNHWWNHWHNHHMWMAHWMNVNVWWLNVMWVM% 
% Display warning if odd number in initial population 
if rem(size_pop,2)==1, 
disp(‘Warning: pop_size should be even. Adding 1 to population.) 
size_pop = size_pop +1; 
end 
HHNHMWWNNWWNHHHWNHWMNHHNNHHHNHWNNWWWHHHWNWNHMWNNWWNWNYWNM% 
% Form random initial population if not enough supplied by user 
if size(fgen,1)<size_pop, 
fgen = [fgen; (rand(size_pop-size(fgen, 1),lchrom)<0.5)]; 
end 
xopt = vb; 
bestf = -Inf; 
new_gen = fgen; 
HHNHMWHNNHWNHHWNHHWHHHHNVHHWHHHWNNWWWNHHHNWNHWHNYWNNNWNO% 
% Header display 
if PRINTING>=1, 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 83 


if ENCODED, 
disp(‘Cost function encoding as binary successful.) 
disp(") 
fgen = decode(fgen,vlb,vub, bits); 
end 
disp(’ Fitness statistics’) 
disp(‘Generation Maximum Minimum Mean __ Std. dev.') 
end 


HHNHWHNNHMWNNYHWNNHHNHHHNHYWNAH HW NHWWNNYWNVHWMNYW%NHYOY6%% 
% Set up main loop 
STOP_FLAG = 0; 


for generation = 1:max_gen+1, 
options(10) = generation-1; % indice per cui si ferma la generazione 
old_gen = new_gen; 


er ce er cr ee ee 
% Decode first if necessary 
if ENCODED, 
X_pop = decode(old_gen,vlb,vub,bits); 
else 
X_pop = old_gen; 
end 
GR 


% Get fitness of each string in population 


for i=1:size_pop, 
X = x_pop(i,:); 
if i==size_pop, 


OPT_STEP = 1; 
else 
OPT_STEP = 0; 
end 
fitness(i) = eval(evalstr); 
end 


[max_fit, INDEX] = max(fitness); 
stats = [stats; max_fit min(fitness) mean(fitness) std(fitness)]; 
if max_fit>bestf, 
bestf = max_fit; 
xopt = x_pop(INDEX(]1),:); 
else 
% Remark: may want to regenerate to guarantee cost decrease 
% Remark: be careful not to get stuck in infinite loop 
end 


384 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


% 


% Display if necessary 

% Remark: consider alternate printing options 

if PRINTING>=1, 

disp([sprintf('%5.0f %12.6g %12.6g ',generation-], ... 
stats(generation, 1),stats(generation,2)), ... 
sprintf('%12.6g %12.6g ',stats(generation,3), ... 
stats(generation,4))]); 
end 


OG He rete eee eae 
% Check for termination 
% Remark: add more termination options 


STOP_FLAG=check_termination3(terminate,generation,stats,STOP_FLAG); 


if STOP_FLAG | OPT_STOP, 
fprintf(‘\n') 
if STOP_FLAG, 
disp(‘Genetic algorithm converged.) 
else 
disp(‘Genetic algorithm terminated by user.') 
end 
return % arresta la funzione genetic.m 
end % fine STOP_FLAG | OPT_STOP 


GEER RRERER ER KKK KEKEKEEKEKRG REEKKEREKEK KAKA 
% Reproduce: selects individuals proportional to their fitness. 
% NEW_GEN is obtained just from a few lines of OLD_GEN. 
% The program kills the weakest individuals and replicates individuals 
% stronger in number proportional to fitness. 
% This method treat organisms to have a single chromosome 
if 0, 
new_gen = reproduc(old_gen,fitness); 


% Mate: randomly reorders (mates) OLD_GEN (alias new_gen). 
new_gen = mate(new_gen); 


% Crossover: creates a NEW_GEN from OLD_GEN using crossover. 
new_gen = xover(new_gen,Pc); 


% Mutate: changes a gene of the OLD_GEN with probability Pm. 
new_gen = mutate(new_gen,Pm); 


else % 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 85 


fitness_old_gen = fitness; 
% Reproduce: selects individuals proportional to their fitness. 
% NEW_GEN is obtained just from a few lines of OLD_GEN. 
% The program kills the weakest individuals and replicates individuals 
% stronger in number proportional to fitness. 
% new_gen = reproduc(old_gen, fitness); 
new_gen = old_gen; 


% Mate: randomly reorders (mates) OLD_GEN (alias new_gen). 
new_gen = mate(new_gen); 
% Crossover: creates a NEW_GEN from OLD_GEN using crossover. 
new_gen = xover(new_gen,Pc); 
% This method treat organisms to have as many chromosomes 
as the coordinates of the function, i.e. length(bits) 


% Mutate: changes a gene of the OLD_GEN with probability Pm. 
new_gen = mutate(new_gen,Pm); 


% fitness of new generation 
% Decode first if necessary 


if ENCODED, 

X_pop = decode(new_gen,vlb,vub,bits); 
else 

X_pop = new_gen; 
end 


for i=1:size_pop, % fitness 


X = x_pop(i,:); 
if i==size_pop, 
OPT_STEP = 1; 
else 
OPT_STEP = 0; 
end 
fitness_new_gen(i) = eval(evalstr); 
end 


% natural selection: only the strongest individuals survive, among all parents 
and progeny, 
new_gen = 
selection(size_pop,old_gen,new_gen,fitness_old_gen,fitness_new_gen); 


end 
%* at dete 
if ENCODED, % se new_gen é in binario 
Igen = decode(new_gen,vlb,vub, bits); 
else 
lgen = new_gen; 
end 


86 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


OG RR ARERR RAIA AIRE REA III EEA IBIS IIA III IAI IIIA IIIA 


% Plotting the solution in bidimensional case 


if length(xopt)==2 
if any(generation==linspace(1,max_gen,10)), 
figure(1) 
hold on 
plot(xopt(1),xopt(2), 'o') 
xlabel(‘{\it \theta}_{1}') 
ylabel(‘{\it \theta}_{2}') 
title(‘popolazione') 
hold off 
end 
if any(generation == (2:10:max_gen)), 
figure(generation) 
plot(gen(:,1),lgen(:,2), 'bo', xopt(1),xopt(2), 'r*') 
axis((O pi/2 0 pi/2]) 
grid 
xlabel(‘{\it \theta}_{1}') 
ylabel(‘{\it \theta}_{2}') 
title(‘popolazione’) 
end 
end 
end 
% for max_gen 
HYYYKHLYUUMWMUHHHHHNYHYHYYLLHKOUUMWUHHHHNYHYHKYYY%%%%%% 
% Maximum number of generations reached without termination 
if PRINTING>=1, 
fprintf(‘\n') 
disp(‘Maximum number of generations reached without termination’) 
disp(‘criterion met. Either increase maximum generations’) 
disp(‘or ease termination citerion.') 
end 
HYYYKKVKUUUMWUHHHHHYNYHYHHLHYOUMUMUUHHHNYHYLNYYYH%%%H%%% 


% Plotting the solution in bidimensional case 


if length(xopt)==2 
figure(1) 
hold on 
plot(xopt(1),xopt(2), 'r+') 
grid 
hold off 

end 


% end genetic 


AHHHHNHHNHHNMHNHHHAHNWHNHNHHHAHHAHMHNMHNH HHH HHNWHNHNWHNWHNH22% 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 87 


AHHKHHHHHNMHNHHHAHHHNWHNHHHAHHAHMHNWHNMH HHH NWHNWHNHNHVHHWVHNHWH% 


% 

% check_termination3.m 

% Stop criterion of an algorithm 
% 


HWW %%YNYNYHHYHMUHW%HHNYNYHHWW%HHNYHYNHHMWMHHNNYNHHWHW%H%NYNNYY% 
function 
STOP_FLAG=check_termination3(terminate,generation,stats,STOP_FLAG) 
% terminate 

% generation 

% stats 

% STOP_FLAG 


if terminate>0, 
if stats(generation, 1)-stats(generation,3) <terminate 
STOP_FLAG = 1; 
end 
end 
% fine terminate>0 


AHHHHHHNHHMHNHHHAHHHNWHNHHHHAHHAHMHAWHNHNHHAHMHNWHNHNWVHWVHNH226% 


88 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


AHHHHWHNHHNHHNHHHHHHNHHNHAHAHHAHMHNMHNHNHAHNWHMWHNHNHWAHNWHNHV2% 
% 


% mate.m 

% 

% Randomly reorders (mates) OLD_GEN. 

% [NEW_GEN,MATING] = MATE(OLD_GEN) performs random reordering 
% on OLD_GEN. NEW_GEN is the new reordering. Individual in row 1 is 
% to \be mated with individual in row 2, etc. MATING is the reordering 
% vector (ie: new_gen=old_gen(mating,:)). 

% 

% Copyright (c) 1993 by the MathWorks, Inc. 

% Andrew Potvin 1-10-93. 

% 


POOL OL OL SA SASL SSK SASL SL SK SA SLOL SA SANA SASL SL SDSL SLL SLA SL SLL SK SASL SKK SLL SKS SKSL 
function [new_gen,mating] = mate(old_gen) 


[junk,mating] = sort(rand(size(old_gen,1),1)); 
new_gen = old_gen(mating,:); 


% end mate 


AHHHHHHNHHNMHNHHHHHNWHNHHHHHHAHHAHMHMAHHHNWHNHNHNHHNWHAHNWNWV% 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 89 


ANHHHHHHHNHNHHHHAHHHNWHNHAHAHHAHMHMHNMHNHAHNWHNWHNHHNWVHNWVHWHV26% 


% 


% mutate.m 


% 


% 


Changes a gene of the OLD_GEN with probability Pm. 
[NEW_GEN,MUTATED] = MUTATE(OLD_GEN,Pm) performs random 
mutation on the population OLD_POP. Each gene of each individual 
of the population can mutate independently with probability Pm. 
Genes are assumed possess Boolean alleles. 

MUTATED contains the indices of the mutated genes. 


Copyright (c) 1993 by the MathWorks, Inc. 
Andrew Potvin 1-10-93.% 


HNWWWHNHNWNHHHWNWHHNAHNHHHNWHAHNAHWWNWWAHHWNHNHAHNHWNYHNUAHHHNHNY% 
function [new_gen,mutated] = mutate(old_gen,Pm) 


mutated = find(rand(size(old_gen))<Pm); 
new_gen = old_gen; 
new_gen(mutated) = 1-old_gen(mutated); 


% end mutate 


AHNHHHHNHHNHHNHHHHHHHMHNHNHAHHAHMWHNWHNHNAHAHHAHMWHNMHNW MH MHWVWVON% 


90 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


YUU OHLHWNHWNHWHUHOHNWNKHWHUHOHVUKHNUHUH“HEUNHWHHN%%N% 
% 
% reproduc.m 


% selects individuals proportional to their fitness. 

% [NEW_GEN,SELECTED] = reproduc(OLD_GEN,FITNESS) selects 

% individuals from OLD_GEN proportional to their FITNESS 

% NEW_GEN will have the same number of individuals as OLD_GEN. 
% SELECTED contains the indices (rows) of the selected 

% individuals (ie: NEW_GEN=OLD_GEN(SELECTED,:)). 

% 


HHNHMWWNNHHNNHMWNHHWNHHNNVHHHNHWWNWW MH HHWNWNHWNNYWWNNYWNM% 

% 

% NEW_GEN is only obtained from a few lines of OLD_GEN. 

% The elements of SELECTED are natural numbers ranging from 1 to size_pop. 

% The elements of SELECTED can be repeated, also. 

% 

% The program kills the weakest individuals and replicates the strongest 
individuals in numbers proportional to their fitness. 


% Copyright (c) 1993 by the MathWorks, Inc. 
% Andrew Potvin 1-10-93.% 
% 


HLH KKHMVHHWOHHWNHHWKHHWLHHMKHHWOHHHNLHHKKYHYHHWH%% 
function [new_gen,selected] = reproduc(old_gen,fitness) 


norm_fit = fitness/sum(fitness); —_% row vector: length(norm_fit)=size_pop 
% fitness>=0 
selected = rand(size(fitness)); % row vector: length(selected)=size_pop) 
sum_fit = 0; % initialization 
for i=1:length(fitness), 

sum_fit = sum_fit + norm_fit(i); 

index = find(selected<sum_fit); _ % row vector, also empty) 

selected(index) = i*ones(size(index)); 

% row vector: length(selected)=size_pop) 

end 


new_gen = old_gen(selected,:); 


% end reproduc 


AHHHHMHHHMHNHHHHMHNHNHAHMHAHHAHNMHMHNHNHHAHHHNWHNHNWAHNWVHNH2026% 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 91 


HHNNYWHNNHHNHHWHNAHHWNVHWMNVHWMNHWVNHHWMN HW MNHHANYWWHNHYH%M% 
% 
% selection.m 
% Natural selection: only the strongest individuals survive, among 

all parents and progeny 
% 
HHNNWHNNWHNHHHWNAHHWNHHNNHHHNHHWWNHWWNHHNWNHWAWNWWHHWWNM% 
% INPUT 
% size_pop: size of the popolation 
% old_gen: old generation 
% New_gen: new generation 
% fitness_old_gen: fitness of the old generation 
% fitness_new_gen: fitness of the new generation 
% 
% OUTPUT 
% new_gen: new generation or old generation 
HHNHMWWNHHWMNHHMNHHMNHHWHANYWNWHWHNNWMNHWMN HWY YW 2% 
function new_gen = 
selection(size_pop,old_gen,new_gen,fitness_old_gen,fitness_new_gen) 


fitness = [fitness_old_gen fitness_new_gen]; —_% row vector 
new_gen = [old_gen; new_gen]; 


[dummy,I|=sort(fitness); 


new_gen = new_gen(I,:); % from the lesser to the greater 

new_gen = flipud(new_gen); % from the greater to the lesser 

new_gen = new_gen(1:size_pop,:); % natural selection: only the strongest 
individuals survive 


% end selection 


AHHHHHHNHHNMHNHHHAHHHNWHNHAHHAHHAHMHNWHNMHNMHAHNWHNWHNHNWAHHWVHWH2026% 


92 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


YUU UUUWU UUW UUW UNYU OOK %% 
% 
% xover.m 


% Creates a NEW_GEN from OLD_GEN using crossover. 

% 

% [NEW_GEN, SITES] = XOVER(OLD_GEN,Pc) performs crossover 
% procreation on pairs of OLD_GEN with probability Pc. 

% Crossover SITES are chosen at random 

% (re: there will be half as many SITES as there are individuals). 
% 

% Copyright (c) 1993 by the MathWorks, Inc. 

% Andrew Potvin 1-10-93. 

% 


HHNHMWHNNHMHNHHWNHHMNHHNHNVWHWWNHWNNWW WH HHNWNHWNHWWNNWWNM% 
% INPUT 

% old_gen: old generation 

% Pe: crossover probability 

% 

% OUTPUT 

% New_gen: new generation 

% sites: location of chromosome cuts 
HHNHMWWNHHWMNHHWMNHHNNHHWNHNHWNWHHWVN HWM NHMWNWNHWNVYMWLN% 2% 
% Single point crossover. 

% Each row of old_gen represents an organism. 

% Of each row, the columns are the genes. 

% Each organism has a unique binary chromosome. 

% Organisms are even in number. 

% The length of SITES equals half the number of rows in OLD_GEN. 


function [new_gen,sites] = xover(old_gen,Pc) 
% Ichrom = size (old_gen, 2); % length of the binary chromosome 


sites = ceil (rand (size (old_gen, 1) / 2,1) * (ichrom-1)); 
% position of chromosome cuts 
sites = sites. * (rand (size (sites)) <Pc); % cuts based on probability Pc 


for i = 1:length(sites); 
new_gen([2*i-1 2*i],:) = [old_gen({2*i-1 2*i],1:sites(i)) ... 
old_gen([2*i 2*i-1],sites(i)+1:lchrom)]; 
end 


% end xover 


AHHHHMHHNHNMHNHHHHHNWHNHNHHHAHHAHMHNMHNHNHAHHWHNWHNHNHWAHNWHNHW26% 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 93 


HHNNWWNNHWNHHHWNNHWNHHNNVHHWHNHHWVNHHWMHHHNWNHWANYWNHNHNWNM% 
% 

% decode.m 

% 

% Copyright (c) 1993 by the MathWorks, Inc. 

% Andrew Potvin 1-10-93. 

% 
HHNHWHNNHHNHYHHNNHWNHHHNHNVHHNHWWNHWMNHHNWNHHNNWWNHNYWNM% 
% DECODE Converts from binary to variable representation. 

% [X,COARSE] = DECODE(GEN, VLB,VUB,BITS) converts the binary 

% population GEN to variable representation. Each individual 

% of GEN should have SUM(BITS). Each individual binary string 

% encodes LENGTH(VLB)=LENGTH(VUB)=LENGTH(BITS) variables. 

% COARSE is the coarseness of the binary mapping and is also 

% of length LENGTH(VUB). 
HHNNWHNNHMNHHHNHHHWWNHHWNWHWNHHWVNHWMHHHWNWNHWANYWNNHYWVOM% 
function [x,coarse] = decode(gen,vlb,vub, bits) 


bit_count = 0; 

two_pow = 2.A(0:max(bits))’; 

for i=1:length(bits), 
pow_mat((1:bits(i))+bit_count,i) = two_pow(bits(i):-1:1); 
bit_count = bit_count + bits(i); 

end 


gen_row = size(gen, 1); 

coarse = (vub-vlb)./((2.Abits)-1); 

inc = ones(gen_row, 1)*coarse; 

X = ones(gen_row,1)*vlb + (gen*pow_mat).*inc; 


% end decode 
YUKU UONUUHYUUNUULKUUUHUUKWUUNHO%NU HN %% 


94 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


HHNHMWHNHHWMNHWMNHHHWNHWNHNVWNWHHWNYHWMHHWWNHWN WNW NNWWV% 
% 

% encode.m 

% 

% Copyright (c) 1993 by the MathWorks, Inc. 

% Andrew Potvin 1-10-93. 

% 
HHNHMWWNNHHNHHWNHWWNHHHNNHWMNHWNNYWMHHH UNH NNYWNNWWNM% 
% ENCODE Converts from variable to binary representation. 

% [GEN,LCHROM,COARSE,nround] = ENCODE(X,VLB, VUB,BITS) 

% encodes non-binary variables of X to binary. The variables 

% in the i'th column of X will be encoded by BITS(i) bits. 
HHNHMWWNNHMWNHHMWNHHMHHHNHNHHMNHHWNNHWWNHHHNWNHWNNYWWNNYWNM% 
% 

function [gen,lchrom,coarse,nround] = encode(x,vlb,vub, bits) 


HHYYYNVKYUUHHHHHHHHYYYYVLLYYOUMWWHHHHYYYHHYYHH%H%% 
% VLB and VUB are the lower and upper bounds on X. 
HUYKYKWKKUUMWWOHHHHHYHYYYKLKYOUMUMUHHHHNYHYYYKYY%Y%H%%%H%% 
% OUTPUT 

% GEN is the binary representation of these X. 

% LCHROM=SUMBBITS) is the length of the binary chromosome. 

% COARSE(i) is the coarseness of the i'th variable as determined by the variable 
% ranges and BITS(i). 

% ROUND contains the absolute indices of the X which where rounded due to 

% finite BIT length. 

OLLI LILO OL OLS, SLO SL SA SA SA SASL SZ OK OZOZ SASK OL SL LLL SALSA SA SA SASL SL OL KSA SALSA SA SASL SA SALSA SAIS 
lchrom = sum(bits); 

coarse = (vub-vlb)./((2.Abits)-1); 

[x_row,xX_col] = size(x); 


gen = []; 
if ~isempty(x), 
temp = (x-ones(x_row,1)*vlb)./ ... 
(ones(x_row, 1)*coarse); 
b10 = round(temp); 
% Since temp and b10 should contain integers le-4 is close enough 
nround = find(b10-temp>1e-4); 
gen = b10to2(b10,bits); 
end 


% end encode 


AHHHHHHNHHNMHNHHHHNMHNHHNHAHHAHHAHMHNMHNHHNMWAHNWHNWHNHNWHNWHNHV26% 


Appendix: Matlab Code (Credits: C. Delvecchio, University of Bari, Italy) 95 


HHH HHYHWUWVHNYNYNYHHMYU%HHNYHYNHYHYMWYHYNNYNYHHH%YH%HNNYWY% 
% 
% b10to2.m 

% 

% Copyright (c) 1993 by the MathWorks, Inc. 
% Andrew Potvin 1-10-93. 


HHNNWHNNHHNHHHWNHWWNAHHNHNNHHWWNHHWWNHWMNHHHWNWNHWANWWHHYWNO% 
% B10TO2 Converts base 10 to base 2. 

% X = B1OTI2(N,BITS) returns a vector of size BITS of the binary 

% representation of the base 10 integer N. If N is a matrix, 

% BITS must be a row vector with as many columns as N. X will 

% then be of size (N,1)xSUM(BITS). 
HHNNWWNNHWNHHMWNHHWNHHWNHNNYWNN HWY WNHHWNHWMVHW%NHYY62% 
function b2 = b10to2(b10,bits) 


bit_count = 0; 
b2_index = []; 
bits_index = 1:length(bits); 
for i=bits_index, 
bit_count = bit_count + bits(i); 
b2_index = [b2_index bit_count]; 
end 


for i=1:max(bits), 
r =rem(b10,2); 
b2(:,b2_index) = 1; 


b10 = fix(b10/2); 
tbe = find( all(b10==0) | (bits(bits_index)==i) ); 
if ~isempty(tbe), 
b10(:,tbe) = []; 
b2_index(tbe) = []; 
bits_index(tbe) = []; 
end 


% Quick quit if all b10 small compared to bit length 
if isempty(bits_index), 

return 
end 


b2_index = b2_index-1; 
end 


% end 
HYYYVKVKUUMWWHHHHHYYHYLYOYUMUHHHHHNYYYYYHLLHU%HWHH%I% 


96 F. Delvecchio, G. Delvecchio, F. D. d’Ovidio 


YNUWUHOHLWNHNUHUHOHNWNHWHUHOHOUNUNUHUH“HLHWNHH%N% 6% 
% 
% sph2cartN.m 


% transforms corresponding elements of data stored in spherical 

% coordinates to Cartesian coordinates X. 

% 
HHNHMWWNHHWMNHWWHHHNNWWNHNYWNWHWWVN HWW HAHWNNHWNNNWWMNY2% 
% The position of a point P is identified: by the radius vector rho inn + 2 
% dimensional space, by the phi angle formed by the projection of the 

% radius vector on the plane OX1X2 with the positive semiaxis of X1, and 
% by the angles th(i) formed by radius vector with the positive half-axes. 
% - (3) --> th(n) 

% -  X(4) --> th({n-1) 

aes ht 

% - X(m+1) --> th(2) 

% - X(n+2) --> th(1) 

% 

% P(rho,phi,th(1,2,...,n)) --> P(X(1),X(2),...,X(n),X(n+1),X(n+2)) 

% In the case of a single point: 

% phi (scalar) 

% rho (scalar) 

% th (column vector of length n) 

% X (column vector of length n+2) 

% In the case of N points: 

% phi (row vector of length N) 

% row (row vector of length N) 

% th (matrix with n rows and N columns) 

% X (matrix with n+2 rows and N columns) 

% rho>=0 

% 0<=phi<=2*pi (in radians) 

% 0<=th(i)<=pi (i=1,2,....n) (in radians) 


POLO OL OA SA SASL SSAA SL SL SK SASK OL SASL SAS. SL SLD OK SLL SLL SLSLSLSK SASL KSLA SLU SSSSSLSLSL SKIL 
function X=sph2cartN(rho,phi,th) 


n=size(th, 1); 

sinth=sin(th); 

costh=cos(th); 

sinphi=sin(phi); 

cosphi=cos(phi); 

costh=flipud(costh); 

rho=repmat(rho,n+2,1); 

X=cumprod(sinth, 1); 

X=flipud(X); 

X=[X(1,:); X; ones(size(phi))]; 

X=[X(1,:).*cosphi; X(2,:).*sinphi; X(3:n+2,:).*costh]; 
X=rho.*X; 

% end sph2cartN 

POOL OL OL SA Sa SAA SASK SSL SASK SA SKOL SALSA SASL SL SLL SDSL OL SL SASL SSL SK SASK SLL SLL L SSK LLU SKIL 


ISBN: 978-2-93 1089-06-4 


BBETOA:: 

eee 

Avenue du CASTEL 87, 1200 BRUXELLES (Belgium) 
D/2020/15070/07 


© Copyright 2020 - International Academic Research Center Str. 
& European Tourism Quality Association sbl 


This book aims to propose a method to quantify ordinal variables through the optimization of an 
objective function. 

Various methods for quantifying (scaling) ordinal statistical variables are known, but if the 
researcher wishes to make comparisons between two or more groups regarding the same feature, 
he should optimize the differences between their distributions, whether they concern assessments, 
attitudes, opinions, or other features. To do this, he needs to optimize linear forms and quadratic 
forms sub-ject to linear constraints of inequality and quadratic constraints of equality. 

This book suggests a solution to the problem, i.e. the use of genetic algorithms. If genetic algorithms 
are used there’s no need for information about the gradient of the objective function and it’s 
impossible to get relative and non-absolute extremes. This paper also implements the rules for 
deciding whether average evaluations are equal or not. 

The next section of this book is intended to apply the above technique in order to quantify the 
ordinal statistical variables. This method is subjected to clearly-defined objective rules and is, 
therefore, freed from the researcher’s will, thus being more reliable and more consistent than other 
methods of quantification that are already present in the literature. The method used to com-pare 
average evaluations expressed at a qualitative ordinal level is described in the first part of this 
paper. Besides, the method itself is validated by comparing the opinions that have been expressed 
by a sample of university graduates about the effectiveness of university education in terms of job 
exploitability; in particular, the interviewed people have been divided according to the different 
Faculties they attended as students, and to their current job condition. 

A broad Appendix is given at the end of this book, including the MATLAB® code expressly written to 
perform this method. 


10,00 € g i U 089 


0 


83064 


